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ABSTRACT 


We  investigate  the  effects  of  noise  on  developing  predictions  of  ill-conditioned 
systems  from  measurements.  In  particular  we  investigate  collinearity  between 
measurement  devices.  We  assume  the  system  is  linear  in  the  measurements 
taken,  and  that  the  measurement  noise  is  uncorrelated  both  to  the  true  mea¬ 
surements  and  to  other  measurement  noises.  The  “matrix’’  and  “vector”  tech¬ 
niques  (two  stress  transfer  function  techniques  developed  earlier)  are  analysed. 
The  matrix  technique  produces  better  results,  but  requires  external  system  in¬ 
formation  during  calibration.  On  the  other  hand,  the  vector  technique  (based 
on  least  squares)  is  easily  implemented  (no  knowledge  of  external  information 
is  required),  but  is  sensitive  to  ill-conditioned  configurations  of  measuring  de¬ 
vices.  The  vector  technique  is  shown  to  be  the  well-known  errors-in-variable 
model,  and  hence  unbiased  but  inconsistent,  which  explains  the  large  errors  it 
produces.  Although  a  correction  to  the  vector  technique  improves  results,  it 
is  still  not  as  accurate  as  the  matrix  technique.  This  vector  correction  addi¬ 
tionally  requires  estimates  of  noise  in  the  measuring  devices,  and  suffers  from 
sensitivity  to  noise  estimation  errors.  The  surrogate-matrix  technique  sub¬ 
stitutes  internal  for  external  system  information,  circumventing  the  need  for 
external  system  measurements.  Simulation  results  involving  a  simple  truss 
support  all  theoretical  findings.  The  surrogate-matrix  and  vector  techniques 
are  recommended  for  ill-  and  well-conditioned  systems  respectively. 
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Effects  of  Noise  on  Almost  Collinear  Systems 


EXECUTIVE  SUMMARY 

Many  aircraft  components  have  a  life  span  that  is  smaller  than  the  design  life  of  the 
aircraft.  As  such  these  components  have  to  be  replaced  when  their  respective  component 
retirement  times  (CRTs)  are  reached.  The  CRTs  are  calculated  using  the  assumed  loading 
spectrum  together  with  experimentally  derived  component  fatigue  strength  data.  How¬ 
ever.  as  the  loading  spectrum  is  assumed  rather  than  being  measured,  the  CRTs  must 
be  based  on  a  worst-case  spectrum  and  tend  to  be  overly  conservative,  resulting  in  the 
retirement  of  most  components  well  before  their  true  “use-by-date” .  On  the  other  hand, 
some  components  may  be  particularly  sensitive  to  changes  in  the  design  loading  spectrum. 
In  such  cases,  any  deviation  of  the  actual  load  spectrum  from  the  design  spectrum  may 
lead  to  the  safe  fatigue  life  being  reached  before  the  CRT.  An  accurate  prediction  of 
the  loading  spectrum,  through  actual  measurements,  has  the  potential  not  only  to  reduce 
operating  costs,  but  also  to  increase  safety. 

Most  fatigue  life-limited  components  on  helicopters  are  located  on  the  rotor  system  and 
its  drive  train,  and  hence  are  dynamic  in  nature.  However,  due  to  the  harsh  working  con¬ 
ditions  on  these  rotor  components,  measurement  instrumentation  tends  to  be  short  lived. 
In  addition,  the  transmission  of  measurements  on  rotor  components  to  a  storage  device  is 
itself  non- trivial.  One  way  to  circumvent  these  direct  measurement  problems  is  to  develop 
transfer  functions  between  measurements  on  fixed  helicopter  components  and  measure¬ 
ments  on  dynamic  (rotor)  components.  Knowing  some  set  of  parameter  values  measured 
on  the  fixed  airframe  then  allows  us  to  determine  the  loading  on  rotor  components  using 
these  transfer  functions. 

Two  assumptions  are  made  in  deriving  the  main  results;  (i)  that  the  predictive  system 
is  linear  and  (ii)  that  the  noise  is  uncorrelated.  We  compare  and  contrast  four  different 
techniques  that  develop  these  transfer  functions;  the  vector,  matrix,  corrected- vector,  and 
surrogate- matrix  techniques.  The  construction  of  these  transfer  functions  has  two  main 
difficulties,  measurement  noise  and  measurement  collinearity.  We  find  that  although  the 
vector  technique  is  easily  implemented,  it  lacks  accuracy  for  ill-conditioned  and  noisy  sys¬ 
tems.  We  show  when  and  why  these  inaccuracies  arise  and  how  to  correct  them.  The 
matrix  technique  produces  good  results,  but  requires  the  measurement  of  external  loads 
during  calibration.  If  a  good  estimation  of  noise  is  available  then  the  corrected-vector 
technique  improves  upon  the  results  of  the  vector  technique  (however,  it  still  does  not 
match  the  accuracy  achieved  by  the  matrix  technique).  The  surrogate-matrix  technique 
substitutes  measurements  from  additional- sensors  for  external-loads,  achieving  an  accu¬ 
racy  that  matches  the  matrix  technique.  Simulations  on  a  simple  truss  support  the  theory 
presented. 

The  results  developed  in  this  report  are  not  merely  restricted  to  helicopters  (or  indeed 
aircraft).  The  results  are  applicable  to  any  linear  system  where  a  predictive  capability  is 
to  be  developed  from  system  measurements. 

The  investigation  herein  of  noise  and  redundancy  will  allow  the  development  of  more 
accurate  and  more  robust  transfer  functions,  and  hence  rotor  loading  predictions.  In 
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turn,  these  accurate  loading  predictions  have  the  potential  to  increase  safety  and  reduce 
operating  costs  for  the  Australian  Defence  Force  and  other  aircraft  operators. 
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1  Introduction 

Accurate  prediction  of  the  loading  in  critical  helicopter  components  has  the  potential 
to  significantly  increase  safety  and  reduce  operating  costs  for  the  Australian  Defence  Force 
and  other  aircraft  operators. 

Aircraft  component  retirement  times  (CRTs)  are  calculated  using  an  assumed  load 
spectrum  and  experiment  ally- derived  fatigue  strength  data.  It  is  unusual  for  aircraft  to 
experience  loading  conditions  as  severe  as  those  assumed,  so  the  CRTs  are  normally  quite 
conservative.  Nevertheless,  Schaefer  [28]  has  found  that  a  few  components  may  require 
that  their  CRTs  be  reduced  by  up  to  75%  (for  further  details  see  [24]).  This  possible 
requirement  for  CRT  reductions  has  safety  implications.  On  the  other  hand,  due  to  the 
conservative  nature  of  the  CRT  estimations  most  components  are  discarded  well  before 
their  safe  fatigue  lives  have  been  expended.  Hence,  the  accurate  prediction  of  component 
loading  has  two  main  benefits — increased  safety  and  reduced  operational  costs. 

Most  fatigue-critical  components  in  helicopters  are  located  in  the  rotor  system  and 
its  drive  train  [20,  p.  46]  where,  unlike  fixed-wing  aircraft,  they  have  single  load  paths 
and  experience  high  and  low  cycle  fatigue  loading  [17].  Unfortunately,  strain  measuring 
devices  are  short  lived  due  to  the  harsh  working  environment  of  these  critical  components. 
In  addition,  the  transmission  of  this  loading  information  from  rotating  components  to  a 
recording  system  is  itself  a  difficult  task.  In  the  past  this  information  has  been  transmitted 
either  via  sliprings  (resulting  in  excessive  noise  problems)  or  telemetry  (normally  involving 
large  cumbersome  devices).  The  difficulty  in  obtaining  reliable  rotor  component  loads  is 
demonstrated  by  the  fact  that,  historically,  these  loads  have  only  ever  been  obtained  during 
short-duration  experimental  trials. 

In  an  earlier  report  [25]  we  developed  stress  transfer  functions  (STFs),  which  related 
the  stress  in  one  section  of  an  idealised  helicopter  truss  to  the  stress  in  a  different  section  of 
the  helicopter.  Once  a  numerical  simulation  of  this  idealised  structure  was  implemented 
we  quickly  realised  that  the  strain  measurements  were  sensitive  to  several  variables  in¬ 
cluding  strain  gauge  location  and  noise  in  the  strain  measurements.  In  fact,  under  some 
configurations  of  strain  gauge  locations,  even  a  small  amount  of  strain  measurement  noise 
completely  corrupted  the  development  of  the  STFs.  We  concluded  the  discussion  in  that 
report  with  several  open-ended  questions  re-iterated  below.  (In  the  questions  shown  below 
we  refer  to  the  STF  system  as  “the  system” ,  and  the  vector  and  matrix  techniques  refer 
to  the  two  procedures  developed  in  an  earlier  report  [25]  to  determine  the  STFs.) 

•  How  does  noise  affect  the  system’s  condition  number? 

•  How  do  structural  characteristics  affect  ill-conditioning? 

•  Why  is  the  error  independent  of  the  approximation  order  for  the  vector  technique? 

•  Why  did  the  vector  technique  under-estimate  the  results? 

•  Why  does  the  matrix  technique  appear  to  be  more  stable  than  the  vector  technique? 

•  Are  there  situations  under  which  the  vector  technique  gives  results  superior  to  the 
matrix  technique? 

•  Will  a  data  analysis  method  based  on  the  bootstrap  method  improve  results? 
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In  this  report  we  answer  these  questions,  and  investigate  what  effect  measurement 
noise  has  on  predictions  of  general  linear  systems.  As  such,  this  work  is  not  restricted  to 
merely  helicopters  (or  indeed  even  aircraft).  The  results  may  be  applied  in  any  situation 
where  a  predictive  capability  is  to  be  developed  from  measured  information. 

We  begin,  in  §  2,  by  revisiting  results  relating  to  singular  value  decomposition.  In  turn, 
we  use  these  results  to  determine  the  condition  number  of  a  two-by-two  matrix  (which  is  a 
measure  of  “goodness”  for  gauge  configuration  on  helicopter  trusses).  We  conclude  the  first 
section  by  determining  when  these  gauge  configurations  are  both  well-  and  ill-conditioned. 

In  the  next  section  (§  3)  we  develop  a  simple  truss  to  investigate  problems  that  may 
arise  in  the  construction  of  transfer  functions.  This  simple  truss  will  be  used  in  numerical 
simulations,  where  external  loads  will  be  applied  to  the  truss  and  simulated  strain  gauges 
will  determine  the  loading  in  various  components.  We  review  the  vector  and  matrix 
techniques,  and  also  simulation  results  from  earlier  work. 

The  largest  section  of  this  report  (§  4)  is  devoted  to  the  analysis  of  the  least  squares 
(LS)  technique  in  determining  the  transfer  functions,  since  the  LS  procedure  is  the  basis 
of  the  vector  technique.  We  begin  §  4  by  investigating  the  effects  of  measurement  noise  on 
our  LS  prediction  of  the  transfer  function.  Two  assumptions  are  made  in  deriving  the  main 
results;  (i)  that  the  predictive  system  is  linear  and  (ii)  that  the  noise  is  uncorrelated.  The 
first  assumption  doesn’t  restrict  our  analysis  to  linear  systems;  we  assume  only  that  the 
predictive  system  is  linear  in  the  measurement  variables.  For  example,  the  two-dimensional 
predictive  system  h(f,g)  —  af(x,y)  +  bg{x,y)  (where  a  and  b  are  constants)  is  linear  in 
/  and  g ,  even  if  the  functions  f(x,y)  and  g(x,y)  are  highly  non-linear.  The  second 
assumption  is  also  non-restrictive  since  we  would  assume  that  any  true  noise  parameter 
would  be  independent  of  noise  in  all  other  parameters. 

Making  these  simplifying  assumptions  we  determine  when  the  LS  prediction  breaks 
down.  A  method  for  correcting  errors  in  LS  predictions  is  then  developed,  and  a  sensi¬ 
tivity  analysis  of  this  correction  undertaken.  The  results  from  the  preceding  subsections 
are  then  generalised  to  higher  dimensions.  We  are  then  in  a  position  to  show  why  the 
matrix  technique  outperforms  the  vector  technique.  The  relationship  between  the  condi¬ 
tion  number  (of  gauge  configuration)  and  the  correlation  (of  the  output  gauges)  is  then 
investigated.  Simulation  results  for  the  corrected- vector  technique  and  surrogate-matrix 
technique  (two  techniques  developed  to  improve  results)  support  the  theoretical  findings. 

The  total  LS  technique  is  reviewed  in  §  5.  We  show  the  relationship  between  the  LS, 
corrected  LS,  and  total  LS  techniques,  and  determine  which  of  these  techniques  is  best 
suited  to  the  development  of  the  transfer  functions. 
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2  Condition  Number  of  a  2  x  2  Matrix 

In  this  section,  we  first  revisit  the  singular  value  decomposition  technique.  The  reason 
for  doing  so  is  to  determine  a  measure  of  “gauge  configuration  goodness”  for  our  helicopter 
truss.  One  such  measure  is  given  by  the  condition  number  of  the  gauge  configuration  on  the 
truss.  As  such,  we  determine  under  what  conditions  a  simple  two-by-two  matrix  becomes 
both  well-  and  ill-conditioned,  where  this  matrix  represents  the  linear  relationship  between 
loads  in  fixed  and  dynamic  helicopter  components.  The  results  from  this  section  will  be 
used  to  investigate  the  relationship  between  system  condition  number  and  gauge  output 
correlation. 


2.1  Revisiting  Singular  Value  Decomposition 

Following  the  outline  in  Golub  and  van  Loan  [10,  p.  71],  the  singular  value  decompo¬ 
sition  (SVD)  of  a  real  ra-by-n  matrix  A  is 

A  =  UDVt ,  (2.1) 

where  the  matrix  D  is  diagonal  and  the  matrices  U  and  V  are  orthogonal,  and  have  the 
following  structure 

U=  [uuu2,...,um]GRmxm,  V=  \vuv2,...,vn}  erx", 


and 


D  =  diag(<r1} . . . ,  ap)  €  Rmxn, 

where  p  =  mi n(m,  n).  The  concept  of  a  diagonal  matrix  (that  is,  a  matrix  with  non- zero 
terms  only  along  the  diagonal)  has  been  extended  to  rectangular  matrices  in  the  above 
definition  of  D. 

The  values  cq  for  i  —  l,...,p  are  termed  singular  values.  The  vectors  Ui  for  i  = 
1, . . . ,  m  and  Vi  for  i  =  1, . . . ,  n  are  termed  left  and  right  singular  vectors  respectively.  For 
the  special  case  of  A  real  and  symmetric,  the  singular  values  and  absolute  value  of  the 
eigenvalues  of  A  become  identical,  as  do  the  singular  vectors  and  eigenvectors.  Horn  and 
Johnson  [11,  p.  213]  state  that  perhaps  this  is  the  reason  the  term  generalised  eigenvalue 
is  occasionally  used  instead  of  singular  value. 

The  singular  values  are  ordered  so  that  01  >  02  >  •  •  •  >  ap  >  0.  The  span(ui, . . . ,  ur) 
and  span(t;r-fi,  ♦  •  •  ,vn)  define  the  range  and  null  space  of  A  respectively,  where  r  is  the 
rank  of  A  such  that  <j\  >  •  •  •  >  ar  >  crr+i  =  •  •  •  =  av  ~  0.  Thus  if  A  has  full  rank,  then 
all  the  singular  values  are  non-zero  and  r  =  p. 

The  2-norm  condition  number  of  the  matrix  A,  defined  in  terms  of  the  singular  values, 
is  given  by 


that  is,  the  ratio  of  the  largest  to  smallest  singular  values.  The  2-norm  of  the  matrix  A  is 
simply  given  by  the  largest  singular  value,  that  is  ||  A\\2  =0*1.  The  square  of  the  Frobenius 
norm  is  given  by  the  sum  of  the  singular  values  squared,  that  is  ||A||^  =  o\  +  (j\  H - V  <Jp. 
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Figure  2.1  shows  a  geometric  interpretation  of  singular  values  and  vectors  for  the 
two-dimensional  case.  We  now  see  that  SVD  merely  decomposes  a  matrix  into  a  ro- 


Figure  2.1:  Transformation  of  the  unit  circle  under  the  mapping  of  a 
two-by-two  matrix  A.  The  lengths  of  the  semimajor  and  semiminor  axes 
of  the  ellipse  give  the  of  the  maximum  and  minimum  singular  values  (o\ 
and  <72 )  respectively. 


tation/reflection  (the  matrix  VT),  a  stretching  (the  diagonal  matrix  £>),  and  a  rota¬ 
tion/reflection  (the  matrix  U).  The  top  left  figure  shows  the  unit  circle,  which  represents 
all  vectors  of  unit  length;  while  the  top  right  figure  shows  the  mapping  of  the  unit  circle 
under  the  matrix  A  (a  linear  transformation).  The  singular  values  control  the  length 
of  the  ellipse’s  axes.  In  general  a  singular  value  decomposition  may  be  thought  of  as  a 
linear  mapping  from  a  hyper-sphere  to  a  hyper-ellipsoid,  the  singular  values  controlling 
the  lengths  of  the  hyper-ellipsoid’s  semi-axes.  The  left  and  right  singular  vectors  define 
the  shape-preserving  transformations.  The  different  symbols  drawn  onto  the  circles  and 
ellipses  show  the  effects  of  the  singular  vectors  (that  is,  rotations  and  reflections).  The 
transformation  A  shown  in  Figure  2.1  is  given  by  the  two-by-two  matrix  and  corresponding 
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SVD  (shown  to  two  decimal  places) 


-1.22 

-0.23 

=  udvt  = 

'  0.87 

0.50' 

'1.40 

0.00' 

'-0.92 

-0.38' 

0.48 

0.67 

-0.50 

0.87 

0.00 

0.50 

-0.38 

0.92  _ 

If  r  is  the  rank  of  A  then  the  least  squares  (LS)  solution  of  the  linear  problem  Ax  =  b 
is  given  by 

*ls  =  f]  ^ Vi  =  A+  b,  (2.2) 

(7  7 

1=1  1 

where  A+  is  the  pseudo-inverse  (defined  below).  The  solution  xLS  minimises  the  2-norm 
residual  || Ax  —  6||2,  and  the  minimum  is  given  by 

m 

\\AxLS-b\\l=  ( uIb f  ■ 

i—r-\- 1 


The  pseudo-inverse  is  defined  as 

A+  =  VD^Ut, 


where 

=  diag  e  Rnxm. 

W  °r  ) 

If  the  rank(A)  —  ft,  then  A+  =  (ATA)~1  AT ,  while  if  m  —  n  =  rank(A),  then  A+  =  A~l. 
Typically  A+  is  defined  to  be  the  unique  matrix  X  G  Rnxm  that  satisfies  the  four  Moore- 
Penrose  conditions  (see  Golub  and  van  Loan  [10,  p.  243]).  The  four  conditions  amount 
to  the  requirement  that  AA+  and  A+A  be  orthogonal  projections  onto  the  range(A) 
and  range(AT)  respectively.  Schmidt  [29]  terms  the  matrix  A+  that  satisfies  the  Moore- 
Penrose  conditions  the  generalised  inverse.  Golub  and  van  Loan  [10]  use  the  pseudo-inverse 
to  show  that  small  changes  in  A  or  b  can  induce  arbitrarily  large  changes  in  the  least 
squares  solution. 

One  approach  to  determine  the  numerical  rank  of  A  is  to  assign  a  tolerance  6  such 
that  the  singular  values  are  split  into  two  sets 


O’ 1  ^  ^  (Jf  ^  ^  ^  ^  , 

and  if  the  matrix  A  elements  have  nsd  significant  digits  one  suggested  choice  for  the 
tolerance  is  5  =  10_n5d||  AH^.  The  infinity  norm  of  a  matrix  A  G  Rmxn  with  elements  aij 
is  defined  as 

n 

l|A,i~  = 

3= 1 

We  have  seen  that  the  SVD  decomposition  of  a  matrix  is  simply  an  orthogonalisation  of 
the  matrix  into  a  sum  of  rank  one  matrices  (see  Equation  (2.2)).  From  this  decomposition 
we  gain  such  useful  information  as  the  matrix’s  rank,  distance  to  the  nearest  singular 
matrix,  condition  number,  several  matrix  norms,  LS  solution,  error  in  the  LS  solution, 
pseudo-inverse,  and  the  directions  of  the  range  and  null  space. 
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2.2  Using  SVD  to  Determine  the  Condition  Number 

In  this  subsection  we  determine  the  condition  number  of  a  two-by-two  matrix  using 
the  SVD  results  from  the  previous  subsection. 

From  the  previous  subsection  on  SVD  we  know  that  every  matrix  A  has  a  SVD  given 
by  A  =  UDVt ,  where  the  matrices  U  and  V  are  orthogonal  and  D  is  diagonal. 


a  b 
c  d 


Let  the  two-by-two  matrix  A  have  the  form 

A  = 

One  way  to  express  the  SVD  of  A  is 

,  .  cos  #  sin  9 

A=  I  .  .  a 

—  sm  6  cos  9 


0" 

0 

(J2_ 

cos  a  sm  a 
—  sin  a  cos  a 


(2.3) 


(2.4) 


since  the  matrices  involving  sinusoidal  functions  are  orthogonal,  and  obviously  the  central 
matrix  is  diagonal.  Multiplying  out  the  above  expression,  making  the  change  of  notation 
substitutions  a\  =  ai  cos#  cos  a,  a2  ~  <T2  cos  #  cos  a,  x  =  tan#,  and  y  =  tana,  and 
equating  the  result  to  A  yields  the  four  equations 


ai  -  a2xy  =  a, 

(2.5) 

criy  +  a2x  =  b, 

(2.6) 

— a\x  —  a2y  =  c,  and 

(2.7) 

—a\xy  +  a2  —  d. 

(2.8) 

Solving  for  d\  using  Equations  (2.5)  and  (2.7)  gives 

a  —  cx 

ai  1  +  X2  ' 

(2.9) 

Similarly  using  Equations  (2.6)  and  (2.8)  to  solve  for  <r2  gives 

_  d  +  bx 

1  +  X2  * 

(2.10) 

Substituting  the  two  equations  shown  above  into  Equations  (2.6)  and  (2.7),  and  solving 
for  y  gives 

b  —  dx  c  +  ax 

V  = -  and  y  =  -- 

a  —  cx  d  +  bx 

respectively.  Eliminating  y  using  the  above  two  expressions  and  solving  for  x  yields  the 
two  solutions 


£+  =  Z  +  \/T+  Z 2 


and 


=  z-  y/l  +  . 


where 


z  = 


a2  +  b2  _  c2  _  d2 
2  (ac  +  bd) 


(2.11) 

(2.12) 


Now  we  want  to  determine  the  condition  number  k,  which  is  merely  the  ratio  of  the 
largest  to  smallest  singular  values  of  the  matrix.  We  do  not  know,  at  this  stage,  which 
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of  d\  or  ?2  is  the  larger.  Let’s  denote  the  first  possibility  as  k\2 (x)  =  and  the 

second  possibility  as  (x)  =  02/0*1  *  (Note  that  from  the  definitions  of  a  we  know  that 
axjd^  —  (Ji/cr2-)  If  can  be  shown  that  fti2(x_)  —  /<v2i(x+)  and  that  ^12(2+)  =  tt2i(x_) 
(which  incidentally  explains  why  we  have  two  solutions  for  x),  and  hence  from  now  on  we 
only  consider  ^12. 

We  can  write  down  ^12  using  Equations  (2.9)  and  (2.10)  as 

a  —  ex 
d  +  bx ’ 

and  then  substituting  Equations  (2.11)  and  (2.12)  gives 

a2  +  b2  +  c2  +  d2  ±  yj [(a  —  d)2  +  (b  +  c)2]  [(a  +  d)2  +  (b  —  c)2] 

Kl2  2  (ad  —  be) 

It  is  the  positive  sign  in  front  of  the  radical  that  maximises  the  above  expression,  and 
hence  we  take  the  positive  case.  Also  note  that  the  sign  of  the  above  expression  depends 
on  the  sign  of  the  determinant  of  A  (that  is,  the  sign  of  ad  —  be).  Remember  that  the 
formulation  initially  involved  two  frequency  parameters  6  and  a  (see  Equation  (2.4)). 
Adding  7 r  to  either  9  or  a  yields  the  same  magnitude  singular  values,  but  with  opposite 
sign.  However,  we  are  only  interested  in  the  magnitude  and  hence  take  the  absolute  value 
of  the  denominator  in  the  final  expression  of  the  condition  number  shown  below 

a2  +  b2  +  c2  +  d2  +  yj [(a  —~d)2  +  (6  -h  e)2]  [(a  +  d)2  +  (b  -  c)2] 

2\ad  —  bc\ 

The  condition  number  developed  in  this  section  suffers  from  complexity,  especially 
when  we  try  to  determine  under  what  conditions  the  matrix  is  well-conditioned.  As  such, 
a  simpler  formulation  is  attempted  below. 


2.3  Using  Norms  to  Determine  the  Condition  Number 

In  this  subsection  we  develop  the  condition  number  of  a  two-by-two  matrix  A,  with 
the  form  shown  in  Equation  (2.3),  using  the  2- norm.  This  formulation  has  the  advantage 
of  simplicity,  but  we  loose  some  information,  namely  the  null  space  directions  given  by 
the  angle  a  in  the  previous  section.  However,  since  we  are  only  interested  in  the  condition 
number,  this  loss  of  information  is  of  no  concern.  If  necessary  the  angle  a  can  be  obtained 
by  solving  the  simple  system  VT  =  (U D)~l  A. 

We  know  that  the  condition  number  may  be  expressed  as  the  ratio  of  singular  values, 
namely  n  =  crmax/crmin.  But  we  also  know  that  the  maximum  and  minimum  singular 
values  are  related  to  the  2-norm  (see  Golub  and  van  Loan  [10]  for  example)  by 

ermax  =  max  ||Ax||2  and  crmin  =  min  ||Aas||2.  (2-13) 

11*112  =  1  ll®l|2  =  l 


Remember  that  for  the  two-dimensional  case  these  singular  values  have  geometric  inter¬ 
pretations.  We  are  either  maximising  or  minimising  over  the  unit  circle,  | |a?| I2  =  1.  The 
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matrix  A  maps  the  unit  circle  onto  an  ellipse,  which  has  semimajor  and  semiminor  axes 
given  by  the  maximum  and  minimum  singular  values  (see  Figure  2.1). 


Now,  if  the  vector  x  is  of  unit  length,  then  the  transformation  of  x  under  A  may  be 
written  as 


Ax  = 


a  b 

cos# 

c  d 

sin# 

Remember  that  the  2-norm  (or  Euclidean  norm)  of  a  vector  x  —  [aq,  X2 , . . .  1xn]T  is 

given  by  ||a?||2  =  \/x{  +  x\  H - b  Let  e  be  the  square  of  the  2-norm  of  the  vector 

Ax ,  that  is  e  =  | |>4a?| |§-  Then  using  the  trigonometric  relations  cos2#  =  (1  +  cos2#)/2 
and  sin# cos#  =  (sin2#)/2,  after  a  small  amount  of  simplification  we  can  write  down  e  as 

e  =  ^(a2  +  b2  +  c2  +  d2)  +  ^(u2  —  62  +  c2  —  d2)  cos  2#  +  ( ab  +  cd )  sin  2#.  (2.14) 


We  want  to  both  maximise  and  minimise  e  (see  Equation  (2.13)).  Thus  we  differentiate 
€  with  respect  to  #,  and  then  solve  for  de/d#  =  0  giving 

tan  2#  —  — , 

72 

where 


71  =  2(ab+  cd), 

72  =  a2  —  b2  +  c2  —  d2,  and 

73  =  a2  +  b2  +  c2  +  d2.  (2.15) 


(We  will  use  the  definition  for  73  later.)  Using  trigonometry,  if  tan  2#  =  71/72  then 


cos  2#  = 


72 


■72 


and  sin  2#  = 


7i 


y/a  +  72’ 

Substituting  these  two  expressions  into  Equation  (2.14)  and  simplifying  yields 


(2.16) 


which  is  the  square  of  either  the  maximum  or  minimum  singular  value.  We  will  show  later 
that  ei  is  in  fact  the  square  of  the  maximum  singular  value.  Thus  to  find  the  minimum  all 
we  need  do  is  add  it/ 2  to  the  value  of  #  (see  Figure  2.1).  From  trigonometry  we  know  that 
sin[2(#  +  7r/2)]  =  —  sin  2#  and  cos[2(#  +  7r/2)]  =  —  cos  2#.  Thus  substituting  the  negative 
forms  of  Equation  (2.16)  into  e,  given  by  Equation  (2.14),  yields 


e2  =  \  (73  -  a/7+T| 

Since  73  is  always  positive,  whenever  A  ^  0,  we  now  see  that  ei  >  e2,  and  hence  <rmax  = 
y/el  and  crmin  =  The  square  of  the  condition  number  is  then  simply 


2  ei 

K  —  — 

^2 

,  2 
- 1  +  — 73 — 

V7i  +  72 
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Finally,  the  condition  number  in  terms  of  the  matrix  elements  is  given  by 


K  = 


2 

_ a2-\-b2-\-c2+d2 _ 

yT  (ab+cd)2+(a2-b2+c2—d2)2 


(2.18) 


It  is  interesting  to  note  that  the  7 i  may  be  written  compactly  in  terms  of  the  columns 
of  matrix  A.  Let  u  —  [a,  c]T  and  v  =  [b:d]T  be  the  first  and  second  columns  of  A 
respectively,  that  is  A  =  [u,  v],  then  71  =  2 uTv,  72  =  uTu  —  vTv,  and  73  =  uTu  -\-vTv. 

We  now  know  that  the  condition  number  of  a  two-by-two  matrix  is  given  by  Equa¬ 
tion  (2.18),  which  allows  us  to  investigate  the  relation  between  gauge  configuration  and 
ill-conditioning. 


2.4  Special  Cases:  Perfect-  and  Ill-Conditioning 


We  want  to  determine  when  the  matrix  A  is  perfectly-conditioned  (k  =  1,  which 
incidentally  is  the  minimum),  and  when  it  is  ill-conditioned  (k  — >  00). 

We  will  consider  the  ill-conditioned  case  first  since  it  is  straight  forward.  From  Equa¬ 
tion  (2.17)  we  see  that  k  — >  00  as  73 / (7^  +  7I)  — >  1.  Simplifying  the  latter  expression  we 
end  up  with  the  condition  ad  =  6c,  which  is  satisfied  whenever  the  determinant  is  zero, 
that  is  \A\  =  0. 

We  now  consider  the  perfectly  conditioned  case.  In  order  for  k  to  be  unity  we  require 
that  the  expression  73/^7^  +  7!  tend  to  infinity.  Note  that  this  expression  is  strictly 
non-negative,  and  hence  we  need  not  consider  the  case  of  negative  infinity.  For  non-zero 
matrices  (A  ^  0)  we  know  that  73  is  positive  (see  Equation  (2.15)),  and  hence  k  —  1 
implies  that  7i  +  7!  =  0.  Solving  for  the  matrix  element  a  gives  the  four  solutions 


a  ~  —d  —  i\b  —  c| ,  a  =  —  d  +  i\b  —  c|,  a  =  d  —  i|6  +  c|,  and  a  =  d  +  i|6  +  c|, 


where  i  =  Similar  expressions  are  obtained  if  we  solve  for  one  of  the  other  elements 

6,  c,  or  d  instead  of  solving  for  a.  Notice  that  all  these  solutions  have  an  imaginary 
component,  unless  6  =  =bc.  Thus  if  all  the  matrix  elements  are  real,  then  the  only  way  to 
achieve  perfect  conditioning  is  if  6  —  ±c  and  a  =  That  is,  k  =  1  only  when  the  matrix 
A  has  the  form 


a  b 
±b  a 


(2.19) 


We  found  that  a  two-by-two  matrix  is  ill-conditioned  when  the  determinant  is  zero, 
and  perfectly-conditioned  when  it  has  the  form  in  Equation  (2.19).  (Note,  however,  that 
a  small  determinant  does  not  imply  an  ill-conditioned  system;  and  reflexively,  a  large 
determinant  does  not  imply  a  well-conditioned  system  [10,  p.  81]  [33,  p.  70].) 
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3  Solution  of  a  Determinate  Simple  Truss 

In  this  section  we  numerically  simulate  the  stress  in  a  simple  truss.  The  rationale  for 
this  simulation  is  to  demonstrate  that,  in  a  real  helicopter,  it  is  possible  to  predict  the 
stress  in  a  rotor  component  from  strain  gauge  measurements  on  fixed  airframe  members. 
The  simple  simulation  developed  in  this  section  only  looks  at  effects  we  are  currently 
interested  in  and  ignores  effects  we  are  not  yet  modelling  (for  example,  vibration). 

Although  we  only  investigate  the  effects  of  rotor  loads  on  the  stress,  we  can  easily  take 
non-load  related  measurements,  such  as  airspeed  and  accelerations.  In  fact,  provided  we 
know  the  functional  form  between  the  stress  at  the  rotor  component  and  this  measured 
parameter,  we  can  still  develop  a  linear  model,  which  is  exactly  what  we  do  in  this  section. 


3.1  Comparison  of  Vector  and  Matrix  Techniques 

We  noted  some  anomalous  behaviour  by  a  ten  member  truss  in  an  earlier  report  [25]. 
Namely,  two  apparently  similar  inversion  techniques  produced  different  results  when  the 
underlying  system  of  equations  took  on  certain  ill-conditioned  forms  (in  particular  collinear- 

ity)- 


p,  1  . . . . .  , 

Vector  Technique 

^Px  jfM''-. 

22 

I/S  r 

<T0,  <71,(72  - ""!l . '  >  °b  —  [<*1 

Matrix  Technique 


LS 

Vl,Px,Py  - > 


@2 i  Px >  Py 


LS 


0-1 

fin 

fil2 

'Px 

inv  ^ 

'Px 

fiu 

fii  2 

pi 

a2. 

P21 

fi‘22_ 

[Py\ 

Py . 

fin 

fi22_ 

cr  2_ 

US 


a0,Px,Py  >  0b  —  [An  P02] 


ao=  [od  <22] 


& 2 
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Figure  3.1:  Schematic  of  the  vector  and  matrix  techniques.  Px  and  Py 
are  external  loads  and  the  Oj  are  stresses.  The  operators  LS  and  inv 
represent  the  least  squares  solution  and  inversion  procedure  respectively. 
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Figure  3.1  schematically  illustrates  the  main  procedures  in  both  these  techniques, 
termed  the  vector  and  matrix  techniques.  Note  that  the  final  form  of  the  solution  is 
the  same  for  both  techniques,  namely  a  linear  model  of  the  stress  at  the  zeroth  gauge  (ao) 
using  the  stresses  at  the  first  and  second  strain  gauges  (a\  and  a2  respectively).  However, 
the  matrix  technique  additionally  uses  measured  external  loading  (Px  and  Py)  as  an  in¬ 
termediate  step  in  determining  the  final  solution  (that  is,  during  calibration).  Once  the 
constant  coefficients  ot\  and  a2  have  been  determined,  the  external  loading  (represented 
by  Px  and  Py)  is  no  longer  required. 

The  coefficients  a*  in  the  vector  and  matrix  techniques,  and  0ij  in  the  matrix  technique 
are  all  constant  and  determined  from  the  least  squares  (LS)  solution.  The  operators 
(depicted  above  the  arrows  in  shaded  boxes)  are  abbreviated  by  LS  for  the  least  squares 
solution  and  inv  for  the  inversion  procedure.  For  the  simple  example  depicted  in  this 
schema  we  know  that  oq  and  a2  for  the  matrix  technique  are  given  respectively  by 


and 


ol\ 


001022  ~  002021 
011022  ~  012021 


OL2  — 


00 2011  ~  001012 
011022  ~  012021 


(3.1) 


(3.2) 


The  vector  technique  involves  determining  the  LS  fit  to  the  m  data  points  (ao*,  a^,  ct2;) 
for  i  =  1, 2, . . . ,  m.  The  matrix  technique  involves  determining  LS  fits  of  all  three  stresses 
(ao,  cri,  and  cr2)  using  the  external  loads  (Px  and  Py).  Thus,  given  the  data  set  (Px*,  Py*,  ao^,  &i u  °2 i) 
for  i  =  1, 2, . . . ,  m,  we  can  determine  the  LS  fits  for  the  three  stresses.  We  first  invert  the 
system  of  equations  involving  the  stresses  ai  and  a2,  so  that  the  external  loadings  ( Px  and 
Py)  are  now  given  as  a  function  of  a\  and  a2.  Substituting  these  expressions  for  Px  and 
Py  into  the  LS  solution  for  ao  in  terms  of  Px  and  Py  yields  the  desired  transfer  function 
relating  the  stresses  a\  and  a2  to  the  stress  ao  (see  Figure  3.1). 

An  alternative,  geometric,  interpretation  of  the  vector  and  matrix  techniques  is  de¬ 
picted  in  Figure  3.2.  As  can  be  seen  for  the  vector  technique,  Figure  3.2(a),  the  plane  that 
predicts  the  stress  at  the  zeroth  gauge  is  determined  using  LS.  (The  LS  technique  merely 
minimises  the  sum  of  distances  squared  between  the  data  points  and  the  resulting  plane.) 

Points  above  the  plane  are  shown  in  a  light  grey,  while  points  below  the  plane  are  depicted 
in  a  darker  grey.  Notice  that  the  plane  (by  its  very  nature)  linearly  relates  the  stress  in 
the  zeroth  gauge  to  the  stresses  in  the  first  and  second  gauges,  that  is  <to  =  Qqai  +  02^2- 
Once  the  LS  ao  plane  is  determined,  then  for  any  given  stresses  a\  =  a\  and  a2  =  a\  the 
stress  at  the  zeroth  gauge  is  given  by  ctq,  the  height  to  the  LS  ao  plane  (see  Figure  3.2). 

In  contrast,  the  matrix  technique  predicts  the  stress  at  the  zeroth  gauge  using  a  more 
convoluted  process.  For  clarity,  the  data  points  used  to  construct  the  LS  planes  are 
omitted  in  Figure  3.2(b)  (the  matrix  technique).  Three  LS  planes  are  determined  using 
the  two  external  forces  Px  and  Py  and  the  three  stresses  cro,  ai,  and  a2.  Now,  given  the 
stress  a\  =  a*  we  determine  the  line  (in  the  Px-Py  plane)  that  causes  the  LS  a\  plane  to 
intersect  the  horizontal  plane  a*.  Similarly,  given  the  stress  a2  =  a\  we  determine  the 
line  in  the  Px-Py  plane  that  causes  the  LS  a2  plane  to  intersect  the  horizontal  plane  o\. 

The  intersection  of  these  two  lines  yields  a  coordinate  (P*,  P*),  which  is  used  to  evaluate 
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o  above  plane 
®  below  plane 


Figure  3.2:  Geometric  interpretation  of  the  (a)  vector  and  (b)  matrix 
techniques.  For  clarity  the  points  that  define  the  LS  planes  are  omitted 
from  the  matrix  technique  illustration  (b). 


the  stress  at  the  zeroth  gauge  Oq  using  the  LS  00  plane.  In  summary,  given  <r*  and  we 
determine  the  coordinate  (P*,  P*).  This  coordinate  allows  us  to  evaluate  cr*  from  the  LS 
cro  plane. 

We  have  seen  that  the  final  functional  form  of  the  vector  and  matrix  techniques  is 
identical,  the  difference  between  the  methods  being  the  procedure  used  to  obtain  that 
result.  In  the  vector  technique  we  use  the  LS  solution  to  develop  a  stress  transfer  function 
between  strain  gauges  on  the  truss;  no  external  loading  information  is  required.  The  matrix 
method  also  uses  the  LS  procedure,  but  additionally  requires  external  loading  information 
during  calibration. 

In  order  to  simplify  the  idealised  structure  from  the  earlier  report  [25],  we  will  now 
develop  a  determinate  truss  with  the  minimum  number  of  members  possible. 


3.2  Solution  of  a  Three  Member  Truss 

Consider  the  three  member  statically  determinate  truss  shown  in  Figure  3.3.  Joint  O 
is  pinned  to  the  ground  and  joint  B  is  connected  to  the  ground  by  a  roller  bearing,  all 
joints  are  pinned.  Joint  C  has  both  a  horizontal  load  Px  and  a  vertical  load  Py.  The 
three  strain  gauges  are  denoted  by  GOB  on  member  OB,  Goc  on  member  OC,  and  GBC  on 
member  BG. 

Denote  the  horizontal  and  vertical  distance  from  the  origin  (at  joint  0)  to  joint  C  by 
xc  and  yc  respectively.  Similarly  denote  the  horizontal  distance  from  joint  O  to  joint  B 
by  xB.  Let’s  non-dimensionalise  these  distances  by  the  vertical  distance  yc,  yielding  the 
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Figure  3.3:  Three  member  statically  determinate  truss ,  with  external 
loading  Px  and  Py.  The  three  strain  gauges  are  denoted  by  G0 B,  Goc, 
and  Gqc  . 


non-dimensional  quantities 


XB 

Vc 


and 


7  “ 


Vc 


We  can  now  express  any  enclosed  angle  within  this  triangle  using  non-dimensional  quan¬ 
tities  (3  and  7.  So,  for  example,  the  cosine  of  the  angle  enclosing  the  joints  0,  B ,  and  C  is 
given  by  cos dOBC  —  (/?  —  7)/ \J  1  +  (/?  —  7)2.  Furthermore,  let  all  three  members  have  unit 
cross-sectional  area. 

Solving  for  the  forces  in  each  member,  then  dividing  by  the  cross-sectional  area  (unity 
in  our  case)  gives  us  the  three  stresses 

Oob  =  (1  -  l/P)(Px  -  7 Py), 


Voc  — 


VT+~' 

T 


[Px  +  (P-  l)P% 


y  J  5 


and 


&BC  —  ~ 


sfi  +  (/?-  7)2 

P 


(Px  iPy)- 


We  see  that  the  stresses  at  gauges  G0b  and  GBC  are  just  multiples  of  each  other.  In  fact, 
a  little  reflection  on  the  problem  will  show  that  there  is  no  combination  of  two  external 
forces  that  will  generate  a  “proper”  system  of  equations.  Let’s  keep  the  same  triangular 
truss  shown  in  Figure  3.3,  but  shift  the  external  forces  to  different  joints  (not  necessarily 
applying  Px  and  Py  at  the  same  joint). 

Applying  either  external  force  Px  or  Py  at  joint  0  is  pointless  since  the  truss  will 
not  experience  any  stress.  Similarly,  the  roller  bearing  (not  the  truss)  will  take  out  any 
vertical  force  applied  at  joint  B.  Thus  apart  from  the  loading  shown  in  Figure  3.3,  there 
is  only  one  other  external  loading  configuration  that  will  produce  what  appears  to  be 
“sensible”  results;  namely  applying  a  vertical  force  at  joint  C  and  a  horizontal  force  at 
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joint  B.  However,  under  this  loading  configuration  the  stresses  at  gauges  Goc  and  GBC  are 
only  dependent  on  the  vertical  loading,  and  hence  independent  of  the  horizontal  loading 
at  joint  B.  Thus  the  stress  at  gauge  GOB  cannot  be  determined  from  the  stresses  at  gauges 
G()C  and  GBC  alone  under  this  loading  configuration. 

The  only  way  to  obtain  a  sensible  result  would  be  to  apply  either  (or  both)  of  the 
external  loads  somewhere  on  the  member  BC  (excluding  the  end  points).  This  loading 
configuration  would  generate  a  bending  moment  within  member  H(7,  which  would  be 
dependent  on  both  where  the  strain  is  measured  GBC  and  the  locations  of  the  external 
loads  applied  on  member  BC.  (To  determine  stress  we  would  additionally  require  structural 
information  such  as  the  second  moment  of  area  and  the  distance  from  the  neutral  axis  to 
the  strain  gauge.) 

So  as  not  to  obfuscate  the  properties  of  this  problem  with  unnecessary  information, 
a  five  member  truss  is  developed  in  the  next  section.  (Note  that  a  suitable  four  member 
statically  determinate  truss,  without  at  least  one  of  the  members  undergoing  bending, 
could  not  be  constructed.  Hence  we  construct  a  five  member  truss  in  the  next  section.) 


3.3  Solution  of  a  Five  Member  Truss 

In  this  section  we  will  investigate  the  vector  and  matrix  solution  techniques  using  the 
five  member  statically  determinate  truss  shown  in  Figure  3.4.  As  before,  all  joints  are 
pinned,  and  joints  0  and  B  are  grounded  via  a  pin  joint  and  a  roller  bearing  respectively. 


Figure  34 :  Five  member  statically  determinate  truss ,  with  external  load¬ 
ing  Px  and  Py. 


Note  that  if  the  vertical  load  Py  was  applied  at  joint  C  instead  of  joint  A,  then  mem¬ 
ber  AC  would  experience  no  stress,  reducing  the  truss  to  the  three  member  configuration 
shown  in  Figure  3.3.  (This  fact  is  easily  verified  by  considering  the  sum  of  vertical  forces 
on  joint  A.) 

Again,  denote  the  horizontal  and  vertical  distances  from  the  origin  (joint  0 )  to  joint  C 
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by  xc  and  yc  respectively.  Similarly,  the  horizontal  distances  from  the  origin  to  joints 
A  and  B  are  given  by  xA  and  xD  respectively.  Non-dimensionalising  all  distances  by  the 
vertical  height  yc  yields  the  following  non-dimensional  quantities 

xA  n  xb  a  xc 

a~ — ,  p= — ,  and  7— — . 

Vc  Vc  Vc 

To  maintain  the  general  geometry  shown  in  Figure  3.4  we  need  to  limit  the  ranges  of  the 
lengths  xa,  and  yc .  The  conditions  yc  ^  0  and  0  <  |a|  <  |/3|  together  with  a (3  >  0 
satisfies  this  criterion. 

Now,  using  simple  trigonometric  relations  both  the  sine  and  cosine  of  any  truss  angle 
may  be  written  in  terms  of  a,  /?,  and  7.  For  example,  the  cosine  of  the  angle  made  by 
joints  ABC  is  given  by  cos0ABC  =  (/?  -  7)/\/1  +  Using  these  trigonometric 

relationship,  we  can  develop  the  loads  in  every  truss  member,  and  (provided  we  have 
cross-sectional  areas)  also  the  stresses. 

If  all  the  truss  members  have  unit  cross-sectional  area,  then  the  stress  in  the  five 
members  are 


CToA  =  (1  -  l/P)Px  ~  7(1  -  Ol/P)Py, 

(3.3) 

aoc  —  p  [Px  +  {P  Py ]> 

(3.4) 

<Jab  =  (1  -  7 /P)(Px  ~  aPy), 

(3.5) 

<Jac  =  -V1  +  (7  “  a)2  Py, 

(3.6) 

and 


<7*0  -  _y/±±K2 2  (Px  -  aPy),  (3.7) 

where,  for  example,  aAB  is  the  stress  in  member  AB . 

Like  the  three  member  truss,  the  stress  in  member  AB  is  a  multiple  of  the  stress  in 
member  BC  (unless  /?  =  7).  Other  special  cases  are  obtained  for  different  combinations 
of  a,  /3,  and  7.  Figure  3.5  shows  the  five  member  truss  under  different  geometric  configu¬ 
rations.  Notice  the  two  special  configurations  (a)  a  =  7  and  (b)  /?  =  7.  If  member  AC  is 
perpendicular  to  member  OB  (that  is,  a  =  7),  then  the  stress  in  members  OA  and  AB  are 
equal,  and  both  are  proportional  to  the  stress  in  member  BC  (that  is,  a0A  —  ^ab  oc  ^bc)- 
While  if  member  BC  is  perpendicular  to  member  OB  (that  is,  (3  —  7),  then  the  stress 
in  members  CM  and  AC  are  proportional,  and  the  stress  in  member  AB  is  zero  (that  is, 

(7 OA  oc  (7 AC  and  (7 AB  =  0). 

To  obtain  sensible  stress  solutions  for  a  truss  with  two  external  loads,  the  minimum 
number  of  beam  members  found  to  be  suitable  was  five.  For  truss  beams  of  varying 
length  (and  hence  geometries),  we  developed  the  stress  equations  for  each  member  of  the 
truss.  We  will  use  this  truss  in  simulations  to  support  theoretical  results  obtained  in  the 
remainder  of  this  report. 
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Figure  3.5:  Five  member  truss  under  two  special  cases  (a)  0  =  7  and 
(b)f3  =  r 

3.4  Simulation  of  a  Five  Member  Truss 

The  results  presented  in  this  subsection  have  already  been  reported  elsewhere  [25].  and 
hence  we  summarise  some  of  the  main  findings. 

Consider  the  five  member  truss  developed  in  §  3.3  undergoing  random  loading,  that 
is,  Px  and  Py  are  random  and  independent  forces.  We  will  simulate  the  stress  in  one  truss 
member  using  the  stress  at  another  two  truss  members,  we  choose  a()C  =  f{aOMaAB). 
This  combination  of  members  was  chosen  for  simulation  through  a  process  of  elimination. 
As  shown  in  the  previous  section,  the  stresses  in  members  A B  and  BC  are  proportional 
and  hence  could  not  be  used  concurrently  in  this  sort  of  simulation.  Member  AC  is 
unaffected  by  the  horizontal  loading  PX:  and  hence  would  make  this  sort  of  simulation 
relatively  simple.  Given  this  information,  the  only  two  choices  remaining  were  (i)  choosing 
member  A B  or  member  BC  and  (ii)  choosing  which  of  the  three  chosen  members  would  be 
the  dependent  member.  The  decisions  to  choose  AB  instead  of  BC,  and  choose  OC  as  the 
dependent  member  were  made  for  the  following  reason;  under  the  above  truss  geometry 
members  OA  and  AB  should  have  a  high  degree  of  correlation.  This  correlation  would 
allow  us  to  investigate  the  effects  of  system  collinearity  and  noise  on  stress  estimation. 

Using  Equations  (3.3)-(3.5)  we  can  solve  for  the  stress  in  member  OC  in  terms  of  the 
stresses  in  members  OA  and  A  B,  thus  giving 


aoc  — 


y/ 1  +  72 
a  —  7 


<J()A  + 


—  a\  aOA  +  ci2  crAB. 


(3.8) 


In  the  above  expression  goa  and  cf AB  are  the  system  inputs  and  goc  is  the  system  output. 
If  the  geometry  of  the  truss  is  fixed  by  setting  the  non-dimensional  lengths  a  =  1.000, 
/?  =  2.000,  and  7  =  1.100,  then  the  above  equation  with  four  digit  precision  becomes 


Goc  —  14.87a0>l  -f  16.52ctab.  (3.9) 

The  above  geometry  results  in  a  truss  with  members  OA  and  AB  having  the  same  length, 
and  joint  C  being  slightly  to  the  right  of  joint  A. 
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If  both  inputs  (crOA  and  cra&)  contain  no  noise,  then  the  least  squares  (LS)  method 
would  recover  the  above  expression  (provided  enough  measurement  points  were  used). 
However,  when  there  is  noise  in  the  input  measurements,  the  vector  method  produces 
results  with  increasing  error  as  the  number  of  points  in  the  LS  approximation  increases. 
On  the  contrary,  the  matrix  method  produces  results  with  decreasing  error  as  the  number 
of  points  in  the  LS  approximation  increases. 

We  now  describe  a  numerical  simulation  of  this  five-member  truss.  Using  random 
input  loads  Px  and  Py  in  Equations  (3.3)-(3.5)  we  obtain  the  exact  stress  in  the  three 
members  OA ,  OC ,  and  AB.  We  add  white  noise  in  the  range  ±10%  to  these  exact  loads 
and  exact  stresses  to  obtain  our  measured  results  Px ,  Py,  aOA ,  <foc ,  and  dAB.  For  example, 
the  measured  stress  in  the  member  OA  is  aOA  =  crOA(l  ±  e),  where  e  €  [—0.10,  ±0.10]  is  a 
random  number  with  uniform  distribution  (white  noise). 


Figure  3.6 :  Distribution  of  LS  errors  for  1000  different  measurement 
sample  sets.  Each  sample  set  contains  128  data  points.  (Results  from 
an  earlier  report  [25].) 


Figure  3.6  shows  the  distribution  (or  frequency  plot)  of  normalised  errors  when  the 
LS  method  is  used  to  develop  an  approximation  to  Equation  (3.9).  The  vertical  scale 
measures  the  error  of  the  LS  approximation  (on  a  logarithmic  scale),  while  the  horizontal 
scale  shows  how  much  of  the  data  falls  into  each  error  bin.  The  distribution  was  parti¬ 
tioned  into  16  data  bins  of  equal  width  on  a  logarithmic  scale.  The  distribution  includes 
1000  measurement  samples  (or  batches);  each  measurement  sample  set  contains  128  data 
points.  In  other  words  128  points  were  used  for  each  LS  approximation  (the  order  of  the 
LS  approximation),  and  there  were  1000  LS  approximations  (the  number  of  batches  in  the 
distribution).  By  data  point  we  mean  an  input  measurement  (PXi  Py^oA^oc^AB)  at  a 
particular  point  in  time.  The  vector  technique  does  not  use  the  external  loading  informa¬ 
tion  Px  and  Py  at  all.  In  contrast  the  matrix  technique  only  uses  this  loading  information 
to  determine  the  coefficient  in  Equation  (3.9)  (that  is,  during  calibration).  Once  these 
coefficients  have  been  determined,  the  matrix  technique  no  longer  requires  this  external 
loading  information. 
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We  have  defined  the  normalised  error  of  the  LS  method  as 

,.  ,  IIS  —  a!  I2 

normalised  error  =  - - — 

IWI2 

where  a  —  [ai,a2]T  and  a  =  [ai,  af\T  are  the  exact  and  approximate  coefficient  vectors 
for  Equation  (3.8)  and  1 1  x  1 1 2  —  xT  x  is  the  2-norm  (or  Euclidean  norm)  of  the  vector  x. 

As  can  be  seen  from  Figure  3.6,  for  a  128  point  LS  approximation  the  matrix  tech¬ 
nique  produces  more  accurate  results  on  average.  Both  vector  and  matrix  solutions  have 
approximately  a  log-normal  distribution  of  errors,  with  a  small  amount  of  skewness. 


5  10  50  100  500  1000 

Approximation  Order  (pts  in  LS) 


Figure  3.7:  Comparison  of  the  normalised  errors  for  the  vector  and  ma¬ 
trix  techniques.  (Results  from  an  earlier  report  [25].) 


If  we  were  to  generate  similar  distributions  for  LS  approximations  with  2, 4, 8, ... ,  1024 
points,  then  we  would  obtain  Figure  3.7.  Imagine  placing  Figure  3.6  perpendicular  to 
Figure  3.7  at  the  128  point  line  on  the  Approximation  Order  axis,  so  that  the  distribution 
projected  out  of  the  page.  If  we  were  to  repeat  this  procedure  for  all  the  LS  distributions 
using  2, 4, 8, ... ,  1024  points  and  produce  a  contour  plot  of  the  height,  then  we  would 
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obtain  Figure  3.7.  The  four  contour  lines  represent  the  lowest  outlier,  first  quartile  (Q i, 
contains  the  lowest  25%  of  the  errors),  median,  third  quartile  (Q3,  contains  the  lowest  75% 
of  the  errors),  and  upper  outlier  of  each  LS  distribution.  In  other  words,  Figure  3.7  shows 
a  continuous  version  of  a  box-plot.  Again  each  distribution  has  1000  sample  batches. 
(For  more  information  on  the  box- plot,  also  termed  the  box- and- whisker  plot,  see  any 
introductory  statistics  book;  for  example  Goldman  and  Weinber  [8].) 

In  Figure  3.7  the  vertical  axis  indicates  the  normalised  error  of  the  LS  approximation, 
while  the  horizontal  axis  indicates  the  order  of  (or  the  number  of  points  in)  the  LS  ap¬ 
proximation.  As  we  would  expect,  the  matrix  technique  accuracy  improves  as  we  increase 
the  number  of  data  points  used  to  develop  the  LS  approximation.  On  the  contrary,  and 
unexpectedly,  the  vector  technique  produces  marginally  increasing  errors  as  we  increase 
the  LS  approximation  order  (except  for  the  change  from  the  2  to  4  point  distributions). 
Both  techniques  converge  to  a  solution,  but  the  vector  technique  converges  to  an  erroneous 
solution.  (We  will  later  show  that  this  incorrect  convergence  is  due  to  the  inconsistency 
of  the  LS  technique  when  the  measured  input  data  contain  noise.) 

A  slight  modification  of  the  vector  technique  will  improve  the  results.  As  we  noted  in 
an  earlier  report  [25,  p.  29]  using  the  median  of  several  low  order  LS  approximations  we 
can  improve  our  estimates  of  the  coefficients  a\  and  <22  in  Equation  (3.8).  The  reason  we 
use  the  median  instead  of  the  mean  is  that  we  expect  some  of  the  LS  estimates  will  be 
outliers.  In  fact,  from  the  singular  value  decomposition  theory  presented  in  the  appendix 
of  an  earlier  report  [25,  §  E.l]  we  saw  anecdotal  evidence  to  support  the  existence  of  these 
outliers. 


Figure  3.8:  The  normalised  error  of  the  median  solutions  of  both  the 
vector  (dashed  line)  and  matrix  (solid  line)  techniques. 


Figure  3.8  shows  the  result  of  applying  this  modification  (the  median  of  low  order  LS 
estimates)  to  the  vector  technique.  Do  not  confuse  these  last  two  figures.  In  Figures  3.7 
we  have  plotted  the  median  (along  with  outliers  and  quartiles)  of  the  distribution  of 
normalised  errors  (that  is,  the  median  of  the  LS  errors ).  In  contrast,  in  Figure  3.8  we 
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have  plotted  the  normalised  error  of  the  LS  coefficient  medians  (that  is,  the  error  of  the 
LS  median).  For  example,  for  a  2  point  LS  solution,  the  median  of  the  LS  errors  is 
approximately  0.4  (see  Figure  3.7)  while  the  error  of  the  LS  median  is  approximately  0.1 
(see  Figure  3.8). 

In  F igure  3.8  we  see  that  as  the  number  of  points  used  to  develop  the  LS  approximation 
increases  the  normalised  error  also  increases.  In  §  4  we  show  the  reason  that  the  error 
tails  off  in  Figure  3.8  is  that  the  vector  technique  returns  inconsistent  solutions  when  noise 
is  present  in  the  measurements.  Notice  that  the  errors  of  this  modified  vector  technique 
are  still  greater  than  the  errors  of  the  higher  order  solutions  of  the  matrix  technique. 
Furthermore,  this  same  modification  can  be  applied  to  the  matrix  technique,  which  will 
subsequently  improve  the  results  of  the  matrix  technique  (see  Figure  3.8). 

We  reviewed  anomalies  associated  with  the  development  of  stress  transfer  functions 
(STFs)  using  simulated  strain  gauge  measurements  (superimposed  with  random  noise). 
Namely,  the  vector  technique  was  found  to  deteriorate  as  the  number  of  points,  used  to 
develop  the  STF,  increased.  The  matrix  technique  did  not  suffer  this  deterioration;  in 
contrast,  it  improved  proportionately  to  the  number  of  points  used  to  develop  the  STF. 
We  also  found  that  the  both  the  vector  and  matrix  techniques  improved,  when  we  took 
the  median  of  several  solutions  of  the  same  order  of  accuracy. 

We  now  turn  out  attention  to  the  LS  problem,  and  in  the  next  section  (deriving  results 
from  scratch)  we  show  how  the  characteristics  of  the  LS  solution  can  be  exploited  to  our 
benefit. 
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4  Least  Squares  (LS)  Fit  from  a  Statistical 

Perspective 

In  this  section  we  show  why  the  vector  technique  produces  larger  errors  than  the  matrix 
technique  for  ill-conditioned  problems.  We  first  investigate  the  simple  two-dimensional 
system,  and  then  generalise  these  results  to  the  n-dimensional  case. 

Models  of  noise  in  measurements  are  termed  “errors- in- variables”  (EIV)  models.  We 
cite  related  work  in  the  EIV  models  whenever  similar  results  have  been  published  in  other 
fields  (especially  the  field  of  econometrics). 

We  begin  this  section  by  investigating  a  simple  two-input/single-output  linear  system, 
where  both  the  input  and  output  are  contaminated  by  noise. 


4.1  LS  Approximation  with  Noise  in  both 
Input  and  Output  Measurements 

Let’s  approximate  the  two-dimensional  linear  system 

z(x,  y)  =  ax  +  by,  (4.1) 

which  we  term  the  true  solution,  by  a  least  squares  (LS)  fit  using  the  m  data  points 
Pi  =  (xj,  yi,  %)  for  i  =  1, 2, . . . ,  m.  If  the  data  points  pi  were  measured  (and  hence  contain 
noise),  then  we  will  get  some  approximation  to  Equation  (4.1)  given  by 

z(x,  y)  =  ax  +  by.  (4.2) 

We  generalise  the  above  system  to  an  n-dimensional  linear  system  in  §  4.6,  but  for  clarity 
we  restrict  ourselves  to  the  two-dimensional  case  shown  above  for  the  present. 

The  system  given  by  Equation  (4.1)  has  a  zero  intercept,  that  is,  z( 0, 0)  —  0.  However, 
using  the  substitutions  x  =  x'  —  #o,  V  =  yf  —  Vo,  and  z  =  zf  —  zq  Equation  (4.1)  can  be 
easily  expanded  to  the  more  general  system 

z1  -ZQ  =  a(x'  -  Xo)  +  b(y'  -  y0) 


or  equivalently 


zf  =  c  +  axf  +  by\ 


where  c  =  zo  —  ax o  —  byo ,  and  x\  yf:  and  z1  all  have  non-zero  means. 

Let’s  decompose  the  measured  inputs  X{  and  y^  and  the  measured  output  %  into  the 
true  inputs,  true  output,  and  noise  terms.  Thus  we  have  that 


Xi-Xi  +  Ui,  Vi  =  Vi  +  Vi,  and  %  =  Zi  +  Wi,  (4.3) 

where  x%  and  yt  are  the  true  inputs,  z%  is  the  true  output,  and  and  Wi  are  randomly 

distributed  noises. 
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To  determine  the  LS  fit  we  need  to  define  an  error  function  and  then  minimise  it  with 
respect  to  the  sought  coefficients  a  and  b.  Let  the  error  be 


.  m 

=  ^  £[*(*<.?<) -ai2- 


i—  1 


Substituting  Equations  (4.2)  and  (4.3)  into  the  above  error  function  yields 

1  m 

—  ~  ^  '  |®(*»  T  ui )  +  b(yi  +  Vi)  —  ( axi  +  byi  +  Wi) 


(4.4) 


Note  that  E  is  always  positive,  unless  z{xh  % )  =  %  for  i  =  1, 2, . . . ,  m  in  which  case  E  =  0. 

Minimising  the  above  error  function  with  respect  to  both  a  and  b  (that  is,  dE / da  =  0 
and  dE/db  =  0)  gives  that 


&XX  +  2s xu  4“  $uu 
4“  Sxv  4"  Syu  4” 


*xy 


Sxy  4“  $xv  4“  Syu  4" 

^2/2/  4”  2Siiy  “I"  Syy 

_  (l(sxx  +  Sxu)  +  b(sxy  +  Syu)  +  (sxw  +  Suw) 
_a(sxy  +  SXV)  +  b(Syy  +  Syv)  +  ( Syw  +  Svw)  ’ 


(4.5) 


where  Sij  is  the  covariance  of  the  parameters  i  and  j.  If  i  =  j  then  this  reduces  to  the 
variance  or  the  standard  deviation  squared,  that  is  Su  —  af.  For  example, 


SXy  — 


m 


m 

X] XiVi  =  cov(x,  v) 

i= 1 


and 


$uu 


1 

rn 


YlUi  =  var(w)- 


i=l 


(4.6) 


We  have  developed  a  linear  relationship  between  the  exact  coefficients  a  and  b  and 
the  approximate  coefficients  a  and  b  in  Equation  (4.5).  The  only  assumption  we  have 
made  so  far  is  that  our  predictive  system  is  linear.  As  stated  in  the  introduction  this 
merely  requires  that  the  predictive  system  be  a  linear  function  of  its  variable,  and  not 
that  the  variables  be  linear  in  terms  of  the  some  “global”  variables.  For  example,  the  two- 
dimensional  predictive  system  z(x,  y)  =  ax  +  by  (where  a  and  b  are  constants)  is  linear  in 
x  and  even  if  the  functions  :r(£,  rj)  and  y(£,  77)  are  highly  non-linear  in  terms  of  £  and  rj. 


4,2  Taylor  Series  Expansion  of  Noise  Terms 

Before  solving  the  two-by-two  system  given  by  Equation  (4.5)  we  make  some  approxi¬ 
mations.  We  defined  the  noise  terms  ,  v and  as  independent,  both  from  each  other 
and  from  the  input  and  output.  However,  we  also  want  to  investigate  the  behaviour  of 
the  solution  when  the  correlation  between  the  noise  terms  is  small,  and  thus  we  proceed 
with  a  series  expansion  analysis. 

Let  e  be  the  maximum  absolute  value  of  any  of  the  sijy  where  at  least  one  of  either  i 
or  j  is  a  noise  term  u ,  v,  or  w,  that  is 

£  —  max  \sxv\,  Is^l,  \syu\i  |Sj/v|>  |Syiu|,  \Suv\i  \sVw\j  •  (4.7) 
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We  can  then  define  the  coefficients  Cjj  in  terms  of  e  as  Cij  =  Sij/e,  and  hence  by  definition 
\cij\  <  1.  Then  using  Equation  (4.5)  and  a  Taylor’s  series  expansion  about  e  =  0,  the 
coefficient  2  has  the  expansion 


where 


~  OiQ 

,  / 

r  a2 

aoOl3  \  2\ 

a  —  — 

+  e 

— 

2  +  C>  {ez) 

a  1 

\ 

\al 

ai  ) 

=  ^0 

S xxSyy 

s“xy)  "1"  (asxx  "I"  bs. 

(5UU 

+  S 

xx )  (5m;  "b  Syy)  —  Sxy 

xyj 


+  0(e ), 


O0  CL  (^SxxSyy  S xy )  "b  $vv  {p*Sxx  "b  bsxy}  5 

Qq  =  ( Suu  +  SXx)  {$vv  "b  $yy)  $xy> 

C% 2  2 CLCyySxx  T  CL  (Cuv  T  2 CXy  "b  Cyu)  bCyy  T  $xy 

^  ( Cuv  T  Cxv  T  Cyu)  $yy  T  ( CUw  T  CLCxu  T  CXw  T  bCyU)  (Syy  T  Syy)  5 


(4.8) 

(4.9) 


and 


^3  Cyy  (suu  T  SXX )  ( CUV  “I"  Cxv  T  CyU)  SXy  T  CXU  {^Syy  T  Syy)  * 


Similarly,  the  coefficient  b  has  the  expansion 

&  ^SxxSyy  $xy)  *b  Suu  (,CLSXy  T  ^yy) 
( suu  T  $##)  {Syy  +  syy)  ~  sXy 


+  0(e), 


where 


(4.10) 


Po  ~  ^  ( SXXSyy  —  SXy^j  +  SUU  ( CLSXy  +  bSyy)  , 

Pi  =  (&uu  T  ^a:#)  ( &vv  T  Syy)  ^xyi 

P2  Cl  (c>UV  T  CXy  T  Cyu )  SXX  “b  ( Cyiy  ~\~  CLCXy  T  bCyy  T  Cyfy )  ( SUU  T  &XX  ) 

clcxu  T  cxw  T  b  {cUy  H-  cXy  T  2c^w)]  sXy  T  2 bcxuSyy, 


and 


Pz  Cyy  ( Suu  T  <5a:a)  {Cuv  T  Cxv  T  Cyu )  SXy  ~h  Cxu  (svv  T  Syy) 


As  an  important  aside,  we  want  to  determine  how  the  variance  of  e  decays  with  the 
number  of  points  used  to  develop  the  least  squares  (LS)  approximation.  From  Equa¬ 
tion  (4.7)  we  know  that  e  is  really  a  covariance  between  two  signals,  say  /  and  g,  so  that 
e  =  sfg  =  cov(/,  g).  From  the  definition  of  variance  (4.6)  we  have  that 
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but  covariance,  see  Equation  (4.6),  is  defined  as 


^  m 

e  =  cov(/ ,  g)  = 


3= 1 

where  n  is  the  number  of  samples  we’ve  taken  and  m  is  the  number  of  points  in  each  sample 
(for  simplicity  we  assume  m  is  the  same  for  all  samples).  Using  these  two  equations  we 
have  that 

2 


var(£)  = 


n  [  m  \ 

(x>^) 

*=1  V=1  / 


The  square  of  a  finite  sum  has  the  expansion 

2 


n  m-1  m 

Ead  =  Eaf+2E  E  aiak- 

\3  =  l  )  3- 1  3= 1  fe=j+l 


Hence  the  variance  of  e  may  be  expressed  as 


var(e)  =  -L 
ml 


777  ^  71  7ft  — ±  /ft  ^  /t 

—  YXfi39ij)  2  fijQij  fik9ik 


m-1  m. 


3=1  U  i= 1 


n 

j= i  fc=j+i  i=l 


Letting  n  — >  c owe  then  have  the  tidy  expression 

1 


var(e)  = 


E  var(/^) 

j=i 


m— 1  m 

+2E  E  *(/s/V), 

fc==7+i 


where  5 (a:)  is  the  expectation  (or  mean)  of  a;,  and  both  /  and  f  (and  also  g  and  5')  are 
considered  different  samples  of  the  same  function,  that  is,  statistical  copies  of  each  other 
and  hence  independent. 

Now,  if  /  and  g  are  independent  then  var (fg)  =  var(/)  var(g)  and  £{fgfg')  = 
£{f)£(9)£{f')£(s/)  (see,  for  example,  Ang  and  Tang  [3]).  Since  at  least  one  of  the  variables 
/  or  ^  is  a  noise  term  with  zero  mean,  we  have  that  £{fgfg')  =  0,  and  hence 


,  .  var(/)var(g) 

var(e)  = - — - 

m 


or  in  terms  of  standard  deviations 


(4.11) 


Hence  the  standard  deviation  of  the  term  e  will  reduce  to  zero  as  1/y/m,  where  m  is  the 
number  of  points  used  to  develop  the  LS  fit. 

Lloyd  [19]  cites  Cramer  [6]  as  a  source  for  a  similar  analysis  to  that  shown  above. 
Lloyd  terms  the  above  analysis  “sampling  moments  of  a  statistic”,  while  Cramer  terms 
it  “characteristics  of  sampling  distributions” ,  but  unfortunately  neither  author  gives  the 
result  shown  above.  Equation  (4.11)  is  a  special  case  of  a  result  stated  (without  proof)  by 
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Kendall  [14,  Eq.  (10.21)],  who  terms  the  above  analysis  the  “standard  errors  of  bivariate 
moments” . 

Lloyd  defines  the  variance  with  1  j(m  —  1)  instead  of  1/m  (as  we  defined  in  Equa¬ 
tion  (4.6)),  and  gives  four  good  reasons  for  doing  so  [19,  p.  24].  For  m  moderately  large 
the  difference  between  these  two  forms  will  be  negligible,  so  that  this  difference  will  not 
change  the  results  shown  above.  The  variance,  as  we’ve  defined  it  in  Equation  (4.6),  is 
termed  the  “maximum-likelihood  estimate  of  variance”  [21,  p.  417]  or  the  “sample  vari¬ 
ance”  [15,  p.  4]. 

Kendall  [15,  p.  5-6]  provides  a  useful  bias  correction  procedure  (developed  by  Que- 
nouille)  termed  the  “jack-knife”  technique,  which  is  reminiscent  of  Aitken  extrapolation 
in  numerical  analysis  (see  for  example  Atkinson  [4]).  Let  tm  be  a  biased  estimator  of  #, 
and  suppose  that  tm  has  the  expectation 


where  the  coefficients  ar  are  allowed  to  be  functions  of  6  but  not  of  m.  In  other  words,  tm 
estimates  6  to  order  1/m  (that  is,  £(tm)  =  6  +  0(1 /m)). 

From  the  m  observations  we  can  calculate  the  estimate  tm- 1;  in  fact,  we  can  calculate 
m  of  these  tm- 1  estimates  (since  there  are  m  possible  subsets  of  (m  —  1)  observations). 
Let  tm- 1  denote  the  mean  of  these  m  estimates  of  tm-i;  that  is,  im- 1  =  ~ 

A  new  estimate  of  6  is 


tm  —  m  t71 


(m  —  1)  t 


m—  1? 


(4.12) 


where  £(t'm)  =  9  +  0(1  /m2)  and  hence  tfm  is  a  second  order  (in  m)  approximation  of 
6.  This  jack-knife  principle  can  be  continued  recursively  (see  Kendall),  we  give  the  next 
estimate 

m2C  -  (m  —  l)2 


t"  = 

vrr> 


m 


2  _ 


(m  -  l)2 


(4.13) 


where  £(t'^n)  =  9  +  0(1 /ms)  shows  that  tnm  is  a  third  order  (in  m)  approximation  of  6 . 

Since  Equations  (4.9)  and  (4.10)  are  biased  to  order  1/m,  remember  that  e  =  0(l/m): 
we  can  use  the  jack-knife  bias  correction  technique.  We  will  later  show,  however,  that  not 
only  are  the  estimates  for  the  coefficients  given  by  Equations  (4.9)  and  (4.10)  biased,  they 
are  also  inconsistent.  (In  fact,  we  already  suspect  this  inconsistent  behaviour  from  the 
results  of  the  vector  technique  shown  in  Figure  3.7  on  page  18.) 

Under  the  assumption  of  uncorrelated  noise  we  derived  approximations  relating  the 
exact  and  approximate  coefficients  of  our  linear  predictive  system  (Equations  (4.9)  and 
(4.10)).  From  these  expressions  we  see  that  as  the  two  inputs  x  and  y  become  more 
correlated  the  denominators  in  Equations  (4.9)  and  (4.10)  tend  to  zero;  this  problem  is 
exacerbated  by  noise  in  the  signals.  Thus  our  coefficient  estimates  a  and  b  become  worse  as 
the  system  becomes  more  ill-conditioned  (even  when  a  small  amount  of  noise  is  present). 
We  have  also  seen  that  the  variance  of  the  bias  e  is  inversely  proportional  to  the  number  of 
points  used  to  develop  the  stress  transfer  function  (Equation  (4.11)).  Finally,  we  learned 
of  a  general  bias  correction  procedure  termed  the  jack-knife  technique. 
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4.3  Input  and  Output  Noise  Independent 


In  this  section  we  assume  that  all  noise  terms  are  uncorrelated  with  each  other  and 
uncorrelated  with  the  true  signals,  that  is,  e  =  0. 

Introduce  the  noise  terms  as 

&UU  J  Syv 

T)x  = -  and  T)y  =  — ,  (4.14) 

SXX  Syy 

and  the  correlation  coefficient  between  the  two  inputs  x  and  y  as 


r  = 


sxy 

y 8xxSyy 


(4.15) 


The  variables  ijx  and  rjy  may  be  thought  of  as  the  noise- to- signal  ratio  in  the  input  signals 
x  and  y  respectively,  and  note  that  rjx,riy  >  0.  Furthermore,  the  Cauchy  inequality  [1] 

states  that  xiVi)  <  ]C?=i  xi  Sj=i  Vj  >  with  the  equality  holding  only  if  X{  —  cyi  for 

some  constant  c,  and  thus  -1  <  r  <  1. 


If  6  =  0  then  the  coefficient  a  in  Equation  (4.9)  becomes 


a  =  a 


—  a 


(!  +  Vy)  ~r2  +  i 

_ ^  V  $XX 

(1  +%)(!  +  %)  ~r2 


1  Vx 


(!  +  %)(!  +Vy)-r2 


Similarly  b  can  be  written  as 


(4.16) 


b  =  b 


1  _  1  +  Vx  ~  KaTfox/s^AfrTfoV^) 

V  (1  +Vx)(l  +  Vy)  -r2 


(4.17) 


Using  a  more  generalised  approach  (the  assumption  of  independent  noise  was  not  used), 
Schneeweifi  [30]  develops  the  matrix  form  of  the  above  expression. 


The  two  terms  ci\/sxx  and  b\/ Syy  measure  the  importance  of  a  particular  parameter 
in  the  linear  relationship,  since  ft\/ sxx  includes  the  spread  of  the  true  signal  x  and  the 
coefficient  a.  In  fact  these  two  terms  are  really  just  a  scaling  of  the  original  equation 


z  —  ax  +  by 

=  (.Vte)  ~=  +  (»«  -f= 

V  ^XX  V  Syy 

where  xj  y/sxx  and  y/  yj sxx  are  non-dimension alised  versions  of  x  and  y  respectively  (that 
is,  they’ve  been  scaled  by  their  respective  standard  deviations). 

Making  some  simplifying  assumptions  we  get  the  underlying  form  of  the  above  expres¬ 
sions  for  a.  Assume  that  the  two  noise-to-signal  terms  are  equal  (that  is  rjx  —  r}y)  and  that 
the  variance  of  the  two  true  input  signals  are  also  equal  (that  is,  sxx  =  syy ).  Then  we  can 
simplify  the  above  expression  for  a  to 


a  =  a  1 


l1  +  V  ~  tl 

(1  +  y  +  r)(l  +7j  —  r) 


(4.18) 
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where  r\  —  rjx  =  rjy  is  a  noise-to-signal  term.  (The  coefficient  b  can  be  similarly  simplified.) 
We  can  re-arrange  the  above  equation  to  determine  the  relative  error  of  the  approximate 
coefficient  a  as 

s  i_  „  /A,n, 


(1  +  n  +  r)(l  +  n  -  r) ' 


(4.19) 


1 


(a= 2 


1  ;  89 


;  I 

b/a=l  0 

I!  I  : 


Figure  The  relative  error  of  the  approximate  coefficient  a  plotted 
against  correlation  r  and  noise  7]  for  several  values  of  the  ratio  of  exact 
coefficients  b/a. 


Figure  4.1  shows  contour  plots  of  the  relative  error  of  the  approximate  coefficient  a 
for  several  values  of  b/a  (the  ratio  of  exact  coefficients).  Notice  that  the  relative  error  is 
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not  symmetric  about  b/a  =  0,  that  is  negative  values  of  b/a  produce  different  results  to 
positive  values.  Also  note  from  Equation  (4.19)  that  the  approximated  coefficient  matches 
the  exact  coefficient  (a  =  a)  only  if  rj  =  0  or  rj  =  ( b/a)r  —  1.  The  second  condition, 
V  =  ( b/a)r  -  1,  suggests  that  under  some  conditions  the  effects  of  noise  exactly  cancel  the 
effects  of  correlated  inputs  to  yield  the  exact  solution! 

If  the  coefficients  of  the  two  inputs  have  the  same  magnitude  (|a|  =  |6|),  then  from 
Equation  (4.18)  the  numerator  and  one  of  the  terms  in  the  denominator  of  the  noise 
amplification  term  cancel,  and  hence  we  have  that 


a  =  a 


1  -rj 


1 


l  +  ri+\r\ 


(if  \a\  =  \b\). 


(4.20) 


The  modulus  of  the  correlation  (that  is,  |r|)  was  needed  in  the  above  equation  for  the 
following  reason.  First  note  that  the  term  r  b/a  is  always  positive,  since  if  the  coefficients 
have  different  signs  ( b/a  <  0),  then  the  correlation  must  also  be  negative  (r  <  0).  Now  if 
r  is  positive  then  the  modulus  operator  has  no  effect,  so  we  need  only  consider  the  case 
r  <  0.  If  the  correlation  is  negative  (r  <  0)  then  the  numerator  1  +  77  -  rb/a  cancels  out 
the  denominator  term  1  -t-  rj  +  r.  Hence  the  denominator  can  be  expressed  as  1  +  77  +  \r\. 

From  the  above  equation  we  see  that  input  correlation  is  no  longer  a  problem;  in  fact, 
it’s  beneficial.  This  last  statement  assumes  e  is  zero;  that  is  all  noise  terms  are  uncorrelated 
both  to  each  other  and  to  the  true  signals.  If  we  have  e  small  but  non-zero  (that  is, 
0  <  e  «  1),  then  we  need  to  temper  the  above  “beneficial”  statement  with  the  restriction 
that  |r|  is  not  too  large.  The  reason  for  this  restriction  on  r  is  that  the  coefficient  of  e  is 
proportional  to  1/a2  =  1/(1  +  77  —  r)2  (where  we  have  used  the  simplification  77  =  px  =  r\y 
in  Equation  (4.8)).  Hence  for  0  <  e  <  1  our  assumption  that  the  e  term  in  Equation  (4.8) 
remains  small  is  violated  as  r  — ►  1. 


We  now  see,  from  Equation  (4.20),  that  the  least  squares  (LS)  solution  will  always 
under-estimate  the  coefficient  a  whenever  there  is  some  noise  in  the  input  signal.  (This 
result  is  well  known  for  the  case  of  zero  correlation  r  =  0  and  is  termed  the  “errors-in- 
variables”  model.)  In  other  words,  the  slope  of  the  estimate  a  is  always  closer  to  zero  than 
the  true  slope  a.  This  under-estimation  is  termed  “attenuation  bias”  by  Johnston  and 
DiNardo  [13]. 

At  first  it  appears  from  Equation  (4.16)  that  scaling  the  inputs  x  and  y  might  improve 
a  (the  estimation  of  the  coefficient).  However,  using  the  notation  x'  —  x/cx  and  y'  =  y/cy 
(where  cx  and  Cy  are  constants),  we  can  show  that  scaling  has  no  effect  on  the  solution 
of  the  coefficients.  Let  a!  and  bf  be  the  associated  scaled  coefficients  a  and  b  respectively, 
then  from  the  relation  ax  =  a*x!  we  must  have  a '  =  acx ,  and  similarly  b'  =  bcy.  We  now 
see  that  a^fs^  —  a' y/sx/xt  and  hence,  from  Equation  (4.16),  scaling  the  inputs  x  and  y 
has  no  effect  on  estimating  a. 

For  rj  small,  or  more  precisely  rj2<^l  —  r2,  we  have  the  approximation 


a  ~  a 


I-77 


(l  —  |r) 

(1  +  r)(l  -  r) 


(if  ?72<1  -  r2). 


We  can  think  of  the  term  multiplying  77  as  the  noise  amplification  function .  For  example, 
if  we  have  10%  noise  in  the  input  measurements  and  the  noise  amplification  function  has  a 
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value  of  3,  then  the  approximate  coefficient  a  would  have  a  relative  error  of  3  x  10%  =  30%. 
The  restriction  rj1  -Cl  —  r2  implies  that  as  the  two  inputs  x  and  y  become  more  correlated 
the  noise-to-signal  ratio  must  decrease  by  at  least  a  proportional  amount  if  the  small  noise 
assumption  is  to  be  used. 
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Figure  f.2:  Absolute  value  of  the  noise  amplifying  function,  |(1  — 
rb/a)/(  1  —  r2)\,  plotted  against  correlation  r  and  ratio  of  exact  coeffi¬ 
cients  b/a,  assuming  a  small  amount  of  noise  (rj2  <  1-  r2 ).  The  white 
line  highlights  the  location  of  the  zero  noise  amplification  line .  The  black 
line  denotes  where  input  and  output  noise  are  equal. 


Figure  4.2  shows  a  contour  plot  of  the  noise  amplification  function  plotted  against  the 
ratio  of  exact  coefficients  b/a  and  the  correlation  r.  In  fact,  this  contour  plot  shows  the 
logarithm  of  the  absolute  value  of  the  noise  amplification,  that  is  log  |(1  —  rb/a)/(  1  —  r2)|, 
which  explains  the  exponential  scale  on  the  contour  legend.  Finally,  note  that  the  ticks 
marks  on  the  6/a-axis  are  neither  linear  nor  logarithmic.  The  ticks  were  in  fact  spaced 
out  using  an  inverse  hyperbolic  sine  function,  sinh-i(:r),  so  that  for  small  values  of  x  the 
ticks  are  spaced  almost  linearly,  while  for  \x\  large  the  spacing  becomes  logarithmic. 

The  minimum  noise  amplification  is  shown  as  a  white  line  r  —  1  /(b/a).  For  b/a  >  0 
and  b/a  <  0  all  contours  above  and  below  this  line  respectively  are  negative.  The  black 
lines  (r  =  0  and  r  =  b/a )  denote  the  locations  where  input  and  output  noise  are  equal,  that 
is  unit  amplification.  Again  notice  that  the  noise  amplification  function  is  not  symmetric 
about  b/a  —  0,  that  is  negative  values  of  b/a  produce  different  results  to  positive  values. 
However,  the  noise  amplification  is  anti-symmetric  about  r  ~  —b/a. 

In  this  section  we  ignored  the  bias  e,  since  we  can  make  it  as  small  as  we  like  (owing  to 
the  fact  that  e  =  0(l/m))  simply  by  using  more  points.  Thus  ignoring  bias  we  re-wrote 
the  relation  between  the  approximate  and  exact  coefficients  (a  and  a  respectively)  in  terms 
of  noise-to-signal  ratios  (j]x  and  rjy),  correlation  (r),  and  importance  weightings  {ay/'s 
and  byfsjfj).  These  relations  are  given  by  Equations  (4.16)  and  (4.17)  for  the  approximate 
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coefficients  a  and  b  respectively. 

To  determine  the  form  of  these  expressions  we  further  assumed  that  the  two  input 
signals  ( x  and  y)  and  associated  noises  ( u  and  v)  had  the  same  variance.  We  subsequently 
found  that  the  expression  relating  the  approximate  and  exact  coefficients,  Equation  (4.18), 
was  a  function  of  the  noise  to  signal  ratio  (77),  correlation  (r),  and  ratio  of  exact  coefficients 
( b/a ).  Our  approximate  coefficient  was  identical  to  the  exact  coefficient  (that  is,  we  obtain 
zero  error)  under  two  conditions;  when  the  noise  is  zero  (77  =  0)  or  when  the  noise  exactly 
cancels  the  correlation  (77  =  ( b/a)r  —  1,  which  was  a  surprising  result).  Contour  plots  of 
the  approximate  coefficient’s  relative  error  function,  Figure  4.1,  reinforced  these  results. 

We  found  that  if  the  exact  coefficients  had  the  same  magnitude  (|a|  =  |&|),  then 
correlation  was  no  longer  a  problem,  even  for  ill-conditioned  systems.  On  the  contrary, 
correlation  was  now  beneficial.  The  well  known  result  of  LS  under-prediction  (from  errors- 
in- variables  theory)  was  observed  in  this  simplified  form  (Equation  (4.20)). 

Assuming,  instead,  that  the  noise  was  much  closer  to  zero  than  the  correlation  was  to 
unity  ( rj 2  <l-r2),  we  gained  a  greater  understanding  of  the  effects  of  exact  coefficient  ratio 
(b/a)  on  errors.  A  contour  plot  of  the  noise  amplifying  function,  Figure  4.2,  emphasised  the 
importance  of  having  input  signals  of  similar  order  of  influence  on  the  output  signal  (that 
is,  b/a  ~  1).  From  this  same  plot  we  discovered  when  the  error  in  coefficient  estimation 
was  zero,  and  also  under  what  conditions  the  noise  would  be  amplified  detrimentally. 


4.4  Correcting  for  the  LS  Inconsistency 

In  this  subsection  we  approximate  the  true  least  squares  (LS)  coefficients  a  and  b  using 
noise  estimates,  input  correlation  estimates,  and  the  approximate  LS  coefficients  a  and 
b •  We  will  first  develop  results  for  the  simple  case  of  equal  noise  (in  all  inputs),  and 
then  generalise  this  result.  Before  beginning  however,  it  is  worthwhile  clarifying  the  terms 
“biased”  and  “consistent”. 

Kendall  [15,  p.  3]  defines  consistency  in  the  following  manner.  An  estimator  com¬ 
puted  from  a  sample  of  n  values,  is  said  to  be  consistent  if,  for  any  positive  (i  and  u,  there 
exists  some  N  such  that  the  probability  that  |i?n  —  $|  <  (i  is  greater  than  1  —  v  for  all 
n  >  AT.  In  the  notation  of  the  theory  of  probability, 

P  ||^n  ~  #|  <  >  1  —  z'j  for  all  n  >  N. 

In  other  words,  given  any  small  quantity  fi  we  can  find  a  large  enough  sample  size  such 
that,  for  all  samples  over  that  size,  the  probability  that  $  differs  from  the  true  value  by 
more  than  ji  is  as  near  zero  as  we  please.  (At  least  one  author  [16]  appears  to  use  the 
terms  “inconsistent”  and  “asymptotic  bias”  interchangeably.) 

If  we  require  that  for  all  n  the  mean  value  of  d  shall  be  7?,  then 

£{$n)  —  for  all  n 

defines  an  unbiased  estimator ,  where  £(•)  denotes  the  expectation  (or  mean).  Kendall 
states  that  the  median  or  mode,  instead  of  the  mean,  can  also  be  chosen  in  defining 
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an  “unbiased”  estimator.  Note  that  an  estimator  may  be  both  consistent  and  unbiased, 
consistent  and  biased,  or  inconsistent  and  unbiased;  that  is,  one  does  not  imply  the  other. 

We  reproduce  Equation  (4.18)  below 


a  —  a 


1-rj 


(l  +  v-  zr) 


(1  +  77  +  r)(l  +r]  —  r) 


+  0(e), 


and  similarly  for  b  we  have  the  expression 


b  =  b 


i  „  (1+7?-H 

'(1  +  r)  +  r)(l  +  T]  -r) 


+  0(e). 


The  above  two  equations  express  a  and  b  in  terms  of  a  and  b.  Thus  re-arranging  we  can 
solve  for  a  and  b  in  terms  of  a  and  b  giving  (after  a  great  deal  of  algebraic  manipulation) 


a  =  a 


1  + 


V  (!  ~  I7-) 
(1  +  r)  (1  -  r) 


0(6 


We  obtain  a  similar  expression  for  b.  To  use  the  above  consistency  correction  formula  we 
need  estimates  of  the  correlation  r  and  the  amount  of  noise  in  the  signal  rj. 

If  we  perform  the  same  analysis  on  the  more  general  form  of  the  coefficients  a  and  b 
given  by  Equations  (4.16)  and  (4.17)  we  obtain 


sXx  (1  -  r 2) 


+  0(e) 


(4.21) 


—  a 


1  + 


Vx  (1  “  <t>r) 


+  0(e), 


(1  +  r)  (1  -  r) 

where  rjx,  rjy ,  and  r  are  defined  by  Equations  (4.14)  and  (4.15)  respectively,  and 


(4.22) 


Remember  from  an  earlier  argument  that  dyjsxx  and  b^/s^  may  be  thought  of  as  impor¬ 
tance  weightings  of  the  inputs  (x  and  y)  in  determining  the  output  z.  Hence  the  above 
0  term  may  be  though  of  as  the  product  of  noise-term-ratio  (%/%)  and  importance¬ 
weighting-ratio  {by/s^/ia^s^c)).  Making  the  necessary  changes,  the  coefficient  b  has  the 
same  form  as  the  above  equation  for  a. 

For  the  above  consistency  correction  formula  to  be  of  any  use,  we  must  have  estimates 
of  the  noise  variances  (swu  and  svv)  and  the  true  input  signal  variances  (sxa;  and  syy). 
Obviously  the  accuracy  of  these  variance  estimates  will  reflect  the  accuracy  of  the  LS 
consistency  correction.  We  now  give  estimates  of  these  variances. 
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Remembering  that  measurements  of  true  values  are  denoted  by  a  hatted  notation,  for 
example  x *  =  Xi  +  U{  (see  Equation  (4.3)),  and  using  the  definition  of  variance  given  by 
Equation  (4.6)  we  know  that 


1  m 

-kx* 

2=1 

l  ™ 

=  +  Ui ^ 

2=1 

1  J71  1  771  1  m 

=  —  xi + 2—  E  + — yN? 

m,  Z - '  77?.  '  77?  7 


2=1  2=1 
=  $XX  4~  2s£W  +  Suu' 


2=1 


We  know,  however,  that  the  covariance  between  a  signal  and  noise  is  less  than  or  equal 
to  e  (see  Equation  (4.7)),  and  hence  sxu  —  O(e).  An  estimate  of  the  variance  of  the  true 
signal  x  is  then  given  by 


$xx  &xx  Suu  4"  0(e), 


(4.23) 


and  hence  we  must  have  an  estimate  of  the  input  noise  variance  suu.  One  way  to  obtain 
this  estimate  is  to  take  repeated  measurement  of  x  with  all  input  variables  (including 
x)  fixed. 


We  can  obtain  an  estimate  of  r,  the  correlation  coefficient  between  the  two  inputs,  in 
a  similar  fashion.  Using  Equation  (4.7)  we  obtain  that  sXy  =  sxy  +  0(e );  then  substituting 
this  result  and  Equation  (4.23)  into  Equation  (4.15)  gives  an  estimate  of  the  correlation 
coefficient 


r  = 


yj  ($xx  suu)(s 


yy 


4 -0(e), 


(4.24) 


where  we  have  again  made  use  of  the  input  noise  variances  suu  and  svv. 

Substituting  the  estimates  for  variances  sxx  and  syy  (Equation  (4.23))  and  for  the 
correlation  coefficient  (Equation  (4.24))  into  the  consistency  correction  formula  given  by 
Equation  (4.21)  gives 


a  =  a 


1  + 


suu{syy  svv)  svv  §  sxy 

i,SXX  SUu){Syy  Sw)  ~  Sfcy 


+  0{€). 


(4.25) 


Notice  that  the  above  equation  has  the  same  form  as  Equation  (4.21)  when  re-arranged. 
The  above  form  (in  contrast  to  the  form  given  in  Equation  (4.21))  was  chosen  in  order  to 
simplify  the  series  expansion  analysis  that  will  be  undertaken  below. 

Equation  (4.25)  represents  the  consistency  correction  for  a  using  sxx  and  syy  (estimates 
of  the  true  input  variations),  and  sxy  (an  estimate  of  the  covariance  of  the  true  inputs). 
These  estimates,  however,  use  the  exact  noise  variances  suu  and  svv.  What  if  we  have  to 
“guesstimate”  these  variances  themselves?  Below  we  perform  a  sensitivity  analysis  on  the 
estimates  of  the  noise  variances. 
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4.4.1  Sensitivity  of  LS  Correction  to  Errors  in  Noise  Estimation 

Let  us  estimate  the  true  noise  variance  suu  and  svv  by  suu  and  %v  respectively.  (Notice 
that  we  have  hatted  the  variance  symbol  s  instead  of  the  noise  terms  u  and  v,  since  u 
and  v  would  imply  exact  measurements  of  the  noises.)  We  can  then  write  down  a  relative 
relation  between  these  true  and  estimated  variances  as 

$uu  ~  4  )  and  svv  —  sVu(l  4-  Jv), 

where  5U  and  Sv  are  the  relative  errors  in  our  estimates  of  the  noise  variance  suu  and 
svv  respectively.  We  see  that  5U ,  for  example,  really  does  represent  relative  error  upon 
re-arranging  the  above  equation  to  give  5U  —  (suu  —  suu)/suu. 

Substituting  the  above  two  equations  into  the  consistency  correction  formula  given  by 
Equation  (4.25),  and  re-arranging,  gives  that 


and  4*='^: 

Vx  b  \/ Sxx  $uu 


Performing  a  Taylor’s  series  expansion  of  Equation  (4.26)  about  the  relative  errors  Su  =  0 
and  Sv  =  0  gives 


(1  -<j>r)-6u  1  +  rjx 


(1  ~<t>r) 


+  $vr  +  % 


{4>-r) 


h  0(5U)  +  0(8V)  +  0(5U8V)  +  0(e).  (4.27) 


Again,  note  the  significance  of  the  special  case  where  the  importance  weightings  are 
equal  (that  is,  </>=!),  for  which  the  above  equation  becomes 


«  =  s(1  +  - 


(1  -  f)  -  8U  1  + 


+  6vr  1  + 


"  ““r  1  (i+f)j  r  ■  (i+f)jj; 

+  0{5U)  +  0(8V)  +  0(SU8V)  +  0(e).  (4.28) 

We  can  now  clearly  see  that  if  |5U|,  \5U\  |1  —  r|,  then  our  estimate  of  the  exact  coefficient 

is  a  good  one.  Notice  that,  unlike  Equation  (4.27),  the  factors  multiplying  5U  and  5V  in 
Equation  (4.28)  are  bounded  as  r  tends  to  unity. 

The  large  factors  multiplying  5U  and  (large  as  r  — ►  1)  originate  from  the  denominator 
of  Equation  (4.26).  In  fact,  our  least  squares  (LS)  correction  becomes  singular  when  this 
denominator  is  zero.  Writing  Equation  (4.26)  in  terms  of  5U  gives 


[l  +  co-A-441  +0(e), 


(4.29) 
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where 


_  rjxjpr_ 

°  (1  -r2  +  rjxy 

(1  -  0r)  +  (1  +  rjy)5v 

ci  — - — - ,  and 

<j>  r 

C2  = _ (1  -  r_ 2)  +  (1  -  f 2  +  rjy)6v _ 

(!  ~  +  Vx)  +  (1  -r2  +  rjx  +  rjy  +  rjxrjy)5v 

Similar  expressions  arise  if  we  re-write  Equation  (4.26)  in  terms  of  5V  instead  of  Su. 

Figure  4.3  shows  the  general  form  of  Equation  (4.29).  Note  the  singularity  at  6U  =  c2, 
and  that  the  sign  of  Co  determines  whether  a/a  tends  to  negative  or  positive  infinity 
(co  <  0  or  co  >  0  respectively)  as  5U  approaches  the  singularity  from  the  positive  side 

(<*u  -*•  4). 


Figure  4-3:  Sensitivity  of  the  corrected  LS  solution  to  errors  in  the  noise 
estimate,  where  a/a  =  1  +  c0(ci  -  Su)/(c2  -  6U)  +  0(e).  The  sign  of 
Co  determines  whether  the  form  shown  (cq  >  0)  or  the  inverted  form 
(cq  <  0/  is  produced. 


Let’s  investigate  the  conditions  under  which  the  LS  correction  is  relatively  insensitive 
to  errors  in  our  noise  estimate.  We  have  just  seen  that  under  some  conditions  bad  noise 
estimates  have  the  potential  to  send  our  LS  correction  to  infinity.  Consider  the  form  of 
the  Taylor’s  series  expansion  of  corrected  LS  to  noise  estimate  sensitivity,  that  is  Equa¬ 
tion  (4.27).  We  want  to  determine  when  the  factors  multiplying  Su  and  Sv  remain  bounded, 
since  under  this  bounded  condition  our  corrected  LS  is  insensitive  to  errors  in  noise  esti¬ 
mates.  The  last  statement  is  true  provided  |JU|,  |<5„  <C  |1  —  <pr\,  since  we  essentially  want 
to  form  an  argument  that  says  we  can  ignore  the  terms  Su  and  5V  and  their  multiplying 
factors  in  Equation  (4.27). 

From  Equation  (4.27)  the  factor  multiplying  5U  is  (ignoring  the  addition  of  unity  since 
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it  is  bounded) 

^  1  -$r  _  1  f}x  (!-</>?) 

^  1  —  r2  (1+f)  (1-f) 

The  first  fraction  on  the  right  hand  side  is  bounded  as  r  —>  1,  so  we  need  only  consider 
the  second  fraction  on  the  right  hand  side.  We  want  to  determine  when  this  fraction  is 
bounded;  to  be  pragmatic  let’s  choose  that  bound  to  be  unity.  We  then  have  that  the 
condition  for  boundedness  is 

1 1  —  <j>r\  <  Jz-\1  —  r| . 

Vx 

It  can  be  shown  that  the  above  condition  is  satisfied  by  either  of  the  two  conditions 


<?<  l  +  (l;?)Afc  (forF<1) 

r  v 

(4.30) 

or 

i  +  (l;f)/¥.<j<i-(i:f)/5.  (forr>1). 

r  r 

(4.31) 

Similarly  the  factor  multiplying  Sv  in  Equation  (4.27)  (ignoring  r 
terms  are  at  least  bounded,  if  not  near  unity)  is  given  by 

and  <f>  since  these 

-  4>~r  $  Vy  (1  -  r/$) 

%l-f2  (1+r)  (1-f)  ' 

Again,  we  need  only  consider  the  second  fraction  on  the  right  hand  side,  which  is  bounded 
by  unity  under  either  of  the  two  conditions 

- -  <  6  <  - 7 —  (for  r  <  1), 

1  +  (1  -  r)/r)y  1  -  (1  -  r)/r)y 

or 

(4.32) 

1  -  (1  -  r)/rjy  ~  ^  ~  1  +  (1  -  r)/rjy  ^°r  T  ~ 

(4.33) 

The  conditions  given  by  Equations  (4.30)-(4.33)  are  mutually  exclusive  (except  for 
the  end-points)  if  fjx ,  fjy  >  1,  as  we  would  expect.  This  last  statement  simply  means  that 
for  a  large  amount  of  noise  our  correction  to  the  LS  solution  is  sensitive  to  errors  in  our 
noise  estimation.  Any  value  of  the  importance  weighting  4>  that  satisfies  either  of  the 
conditions  given  by  Equations  (4.30)  or  (4.31)  as  well  as  either  of  the  conditions  given  by 
Equations  (4.32)  or  (4.33)  guarantees  that  the  corrected  LS  is  insensitive  to  errors  in  noise 
estimation;  this  follows  since  we  are  essentially  showing  that  the  ratio  a/ a  is  on  the  flatter 
part  of  Figure  4.3.  As  a  final  point,  note  that  these  four  conditions  on  <fi  become  harder 
and  harder  to  satisfy  as  the  approximate  correlation  r  tends  to  unity;  again  something  we 
would  expect  since  the  measurement  data  set  is  becoming  more  ill-conditioned. 

Figure  4.4  shows  how  the  normalised  error  (in  determining  the  coefficients  a  and  b ) 
varies  with  errors  in  noise  estimation  5U  and  5V.  Both  noise  estimation  errors  were  varied 
simultaneously,  that  is,  5  =  Su  =  Sv,  which  explains  why  the  minimum  normalised  error  is 
not  zero.  The  lines  shown  in  Figure  4.4  merely  connect  the  associated  points,  hence  the 
jagged  appearance  of  the  lines  near  the  minima. 
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Figure  4-4:  Plots  of  normalised  error  versus  errors  in  noise  estimation 
8  =  6U  =  6V,  for  several  different  approximation  orders. 


The  conditions  that  guarantee  insensitivity  to  noise  estimates  (Equations  (4.30)-(4.33)) 
are  only  valid  under  the  assumption  |<5U  | ,  |<5U|  <C  1  —  <pr .  This  assumption  is  very  restrictive 
for  ill-conditioned  systems.  In  fact  it  is  so  restrictive  as  to  throw  a  shadow  of  doubt  over 
the  corrected  LS  technique  for  highly  ill-conditioned  systems. 

After  defining  the  terms  bias  and  consistent,  we  showed  that  the  LS  technique  was 
biased  as  well  as  inconsistent,  and  gave  a  correction  for  the  inconsistency.  In  real  life  we  will 
usually  not  know  the  errors  in  our  measurements  exactly,  for  if  we  did  then  we  would  know 
the  exact  measurement!  As  such  we  have  to  estimate  the  noise  in  the  measurements.  We 
showed  that  the  LS  correction  was  sensitive  to  noise  estimations,  and  developed  conditions 
under  which  the  correction  might  be  sensitive.  We  saw  that  bad  noise  estimates  could 
easily  send  the  correction  to  infinity. 


4.5  LS  Approximation  with  both  Input  and  Output  Noise: 
One-Dimensional  System 

We  briefly  look  at  the  special  case  of  a  single  input,  and  show  that  the  results  developed 
so  far  agree  with  the  large  amount  of  literature  on  this  subject. 

The  exact  linear  system  z  =  ax  is  modelled  by  Ketellapper  [16]  using  the  approximate 
linear  system  z  =  ax.  which  is  equivalent  to  the  one-dimensional  versions  of  Equations  (4.1) 
and  (4.2)  respectively.  Furthermore,  the  linear  relationship  between  the  measured  signal, 
true  signal,  and  noise,  given  by  Equation  (4.3),  still  holds.  Ketellapper  assumes  normal 
distributions  for  true  input  signal  Xi,  and  the  input  and  output  noises  Ui  and  W{,  so 
that  the  (zuXi)T  for  i  =  1. ...,  rn  pairs  are  independent  drawings  from  a  bivariate  normal 
distribution.  The  variance  of  the  input  noise,  suu,  is  assumed  to  be  known. 

The  least  squares  (LS)  estimator  (which  Ketellapper  terms  the  “ordinary  LS”)  is  given 
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by  aLS  =  s-xz/sxx  (this  result  is  well  known,  see  for  example  Spiegel  [32]).  Using  Equa¬ 
tion  (4.23)  we  can  write  this  LS  estimate  as 

aL8=  ■— f-  ■  (4.34) 

&xx  i  Suu 

Ketellapper  also  states  that  the  corrected  LS  estimate  is  given  by  aCLs  =  W(  $xx  Sun)1) 
which  although  less  well  known  has  being  around  for  some  time  [12,  p.  170].  Again  using 
Equation  (4.23)  we  can  re-write  this  expression  as 


aCLS  — 


(4.35) 


Setting  the  correlation  to  zero  (r  =  0  since  we  have  only  one  input  signal)  and  substi¬ 
tuting  aLS  in  Equation  (4.34)  for  a  in  Equation  (4.21)  we  obtain  Equation  (4.35).  Thus, 
the  inconsistency  correction  we  developed  in  §  4.4  is  in  fact  what  Ketellapper  terms  the 
corrected  LS  estimate. 


If  we  let  r)  denote  the  noise  to  signal  ratio  suu/sxx ,  then  Ketellapper  shows  the  incon¬ 
sistency  of  the  LS  technique  through  the  equation 


£oo(2 ls)  ~  0, 


V 

(i+*7)r 


(4.36) 


where  the  symbol  ~  and  the  subscripted  infinity  symbol  on  the  expectation  symbolise 
an  asymptotic  result.  The  above  expression  is  a  special  case  of  Equation  (4.18)  with  the 
correlation  set  to  zero  (r  =  0).  Furthermore,  the  variance  of  the  LS  estimator  (a  special 
case  of  a  result  from  Schneeweifi  [30,  Eq.  (5.8)])  is  given  by 


varoc(aL5)  ~  — 
m 


+  a 


(l+»?)  (l  +  »?)2 


(4.37) 


remember  that  m  denotes  the  number  of  points  used  to  develop  the  LS  approximation. 
Ketellapper  also  gives  the  asymptotic  variance  of  the  corrected  LS  estimate  as 


var  ^acLs) - 

m 


^^(1  +r))  +  a2r)(  1  +  2  rj) 


(4.38) 


As  we  would  expect,  when  the  noise  in  the  input  signal  tends  to  zero  (77  — >  0),  the  asymp¬ 
totic  variances  of  the  ordinary  LS  (Equation  (4.37))  and  corrected  LS  (Equation  (4.38)) 
converge.  Furthermore,  the  asymptotic  bias  (inconsistency)  of  the  ordinary  LS  solution 
(Equation  (4.36))  tends  to  zero  as  77  — »  0. 

As  can  be  seen  from  the  two  asymptotic  variances  shown  above,  the  ordinary  LS 
estimate  (Equation  (4.37))  has  less  variance  than  the  corresponding  corrected  LS  estimate 
(Equation  (4.38)),  remember  that  rj  >  0.  It  is  for  this  reason  that  Ketellapper  introduces 
a  measure  of  “goodness”,  which  he  terms  the  “asymptotic  mean  square  error”  (AMSE). 
The  AMSE  is  defined  as  the  sum  of  the  asymptotic  variance  and  the  squared  asymptotic 
bias,  that  is, 

AMSE(a)  =  var  (a)  +  [£(a)  —  a]2  .  (4.39) 

At  first  sight  the  AMSE  measure  given  above  may  seem  questionable,  after  all  it  weighs 
variance  and  the  square  of  the  bias  equally.  The  derivation  of  the  form  shown  above  (see 
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Equations  (5.8)-(5.9))  shows  that  in  fact  the  mean  square  error  (as  Kendall  and  Stuart  [15] 
term  it)  is  a  measure  of  “closeness”  to  the  true  solution. 

Ketellapper  derives  a  condition  for  when  to  use  ordinary  LS  estimate  in  preference  to 
the  corrected  LS  estimate.  The  condition  AMSE(aLi>.)  <  AMSE(acts)  is  satisfied  if  and 
only  if 

fww  >  q2  (m  ~  4)  —  7(5  +  2?7) 
sXx  (1  -f-  t?)(2  +  77) 

For  m  large  and  77  small  the  above  condition  becomes  sww/sxx  >  a2??  m/2.  Ketellapper 
concludes  that  the  strategy  to  always  use  corrected  LS  is  appropriate,  though  not  optimal 
(since  under  some  conditions  the  ordinary  LS  procedure  may  perform  better  under  the 
AMSE  measure). 

In  this  section  we  have  shown  the  relation  between  several  one-dimensional  results 
found  in  the  literature  and  the  theory  that  we  have  developed. 


4.6  LS  Approximation  with  both  Input  and  Output  Noise: 
n-Dimensional  System 


We  now  generalise  the  results  of  the  preceding  sections  to  an  n-dimensional  system, 
restricting  correlation  between  inputs. 

Analogous  to  the  preceding  section  introduce  a  linear  function  of  n  variables 


^  1  ?  5  •  •  •  ?  Xn ) 


which  we  want  to  approximate  using 


n 


ajxv 

3= 1 


n 

%{x I?  •  •  •  j  'E’ti)  ^  ^  Q*j Xj . 

3= 1 

Both  the  dj  and  the  aj  are  constants.  Given  a  set  of  m  data  points  pi  =  {xn^x^, . . . ,  xin;  Zj) 
we  want  to  determine  the  constants  aj.  The  measured  input  and  outputs  may  be  decom¬ 
posed  into  true  input  and  output  plus  input  and  output  noise  as  follows 


x%j  —  Xij  Uij  and  z%  —  z%  - f- 

for  %  —  and  j  =  1,2 where  X{j  and  z\  are  the  true  inputs  and  true 

outputs,  and  u^j  and  are  their  respective  noise  terms. 

For  simplicity  we  assume  that  the  noise  terms  are  uncorrelated  with  the  input,  output, 
and  each  other  (that  is,  s*Ui  —  0,  where  *  is  any  input,  output,  or  noise  term  except 
ui).  This  assumption  is  not  very  restrictive,  since  if  e  =  max(|s*w.|),  then  the  result  from 
the  two-dimensional  section  still  holds,  namely  var(e)  =  0(l/m).  The  least  squares  (LS) 
problem  is  given  by  the  linear  system  shown  below 


’(i  +  m) 
721 

712 

(1  +  72)  •• 

..  ^  ^ 

1 

V 

S2 

'  1 

721 

712  *  ■ 

1  •• 

•  ^  ^ 

_ 1 

~ai 

CL2 

7nl 

7n2 

•  (!+7n)_ 

_7nl 

7n2  •  ‘ 

•  1  _ 

&n_ 
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where  rji  =  sUiUJsXiXi  and  7^  =  sXiXj/sXiXi.  The  term  rji  may  be  thought  of  as  the  ith 
noise  term,  and  7 y  may  be  thought  of  as  the  normalised  covariance  between  the  ith  and 
the  jth  inputs  (normalised  against  the  iih  variance).  Although  the  original  covariance 
matrix  is  symmetric,  the  normalised  covariance  matrix  is  no  longer  symmetric  since  we 
are  dividing  each  row  of  the  covariance  matrix  by  the  diagonal  term. 

The  structure  of  the  LS  system  of  equations  shown  above  is  more  clearly  seen  if  we 
write 

(J  +  G  +  H)a  =  (J  +  G)a,  (4.40) 

where  a  =  [Si,a2,  •  •  •  ,an]T  and  a  =  [ai,a2,  •  •  •  ,  an]T  are  vectors.  Furthermore,  I  is  the 
identity  matrix,  H  =  diag(r/i,  772, ...,%)  is  the  noise  matrix  (which  has  elements  ha  =  rji 
and  hij  =  0  for  i  ^  j),  and  G  is  the  correlation  matrix  (which  has  elements  gu  =  0  and 
gij  =  7^  for  z  ^  j ) .  These  three  matrices  are  n-by-n  matrices,  and  H  is  additionally  both 
diagonal  and  positive  definite.  From  the  above  form  we  clearly  see  that  if  H  =  0,  then 
a  =  a. 

In  their  excellent  treatise  on  the  total  least  squares  method  [35]  (see  §  5  for  more  details 
on  this  method),  Van  Huffel  and  Vandewalle  also  show  that  the  LS  technique  is  a  biased 
estimate.  Consider  the  system  of  equations 


Xa  =  z, 


(4.41) 


where  the  matrix  X  G  Rmxn  contains  the  exact  input  values  X{j  (that  is,  measurements 
without  errors)  of  the  n  input  variables  and  m  samples,  the  vector  a  G  Rnxl  contains  the 
exact  coefficients,  and  the  vector  z  G  Rmxl  contains  the  exact  output  values.  The  classical 
LS  solution  then  is  given  by 

(4.42) 

where  X  =  X  4-  C7,  a,  and  z  =  z  +  w  are  the  measured  values,  and  hence  approximate 
X,  a,  and  z  respectively. 

Van  Huffel  and  Vandewalle  (combining  the  work  of  two  other  authors)  obtain  the 
following  result  for  convergence  of  the  LS  solution  [35,  p.  232] 


dis  —  plim  Iji  (^ a?  T 


ci  T  Sx 


(4.43) 


In  the  above  equation  “plim”  is  the  probability  limit  (that  is,  converging  in  probability,  see 
for  example  [15,  p.  3])  and  In  G  Rnxn  is  the  identity  matrix.  Sx  G  Rnxn  is  the  covariance 
matrix  of  the  exact  inputs,  Su  G  RnX71  is  the  covariance  matrix  between  the  input  errors, 
and  Suw  G  RnXl  is  the  covariance  vector  between  the  input  errors  and  the  output  error. 
These  two  matrices  and  vector  are  defined  respectively  by 

Sx  =  —  XTX,  Su  =  —UTU,  and  Suw  =  —XTz.  (4.44) 


Assuming  Su  =  a2In  (that  is,  the  input  errors  are  independent  and  have  the  same 
variance  a2)  and  Suw  =  0  (that  is,  output  and  input  errors  are  independent)  then  Equa- 
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tion  (4.43)  simplifies  to  [35,  p.  232] 


O'LS 


2  r  N-1 


In  ~  plim  a2  (, Sx  +  <r2In) 
m— ►  oo 


a 


In  -  a2  (  plim  S ; 


-l 


a, 


where  Sg  G  Rnx"  is  the  covariance  matrix  of  the  measured  inputs,  that  is  S%  =  (1  /m)X  T X 
Thus  large  LS  biases  result  from  large  errors  (either  Su  or  a  large),  an  ill-conditioned  ma¬ 
trix  X  (that  is,  inputs  almost  collinear),  or  a  oriented  close  to  the  lowest  right  singular 
value  of  X  (that  is,  a  points  towards  the  null  space  of  X). 

If  the  covariance  matrix  Su  and  covariance  vector  Suw  are  known,  then  the  LS  bias 
can  be  removed  using  the  corrected,  LS  technique 


Clr 


,s=(xTX 


mS , 


( 


X  z  —  rriS, 


o- 


(4.45) 


Assuming  input  and  output  noise  are  uncorrelated  (that  is,  Suw  =  0)  then  the  two- 
dimensional  version  of  the  above  equation  reduces  to  Equation  (4.25). 

Van  Huffel  and  Vandewalle  show  that  the  under  the  assumption  of  independently  and 
identically  distributed  input  and  output  errors  (that  is,  Su  =  a2In  and  var (z)  =  a),  the 
corrected  LS  and  TLS  solutions  asymptotically  yield  the  same  consistent  estimate  a  of  the 
exact  solution  a.  Naidu  [22]  independently  shows  that  the  corrected  LS  (or  alternative  LS 
as  Naidu  terms  it)  is  consistent. 

Compare  the  LS  solution  given  by  Equations  (4.42)  with  the  corrected  LS  solution 
given  by  Equation  (4.45).  The  above  expression  may  be  thought  of  as  a  “method-of- 
moments”  estimator  [35]  since  £{X  T X  -mSu)  =  XT X  and  £{Xz-mSuw)  =  XTXa, 
where  £(•)  is  the  expectation  (or  mean).  Johnston  [12,  Eq.  (6-47)]  gives  the  same  result 
for  the  corrected  LS,  and  goes  on  to  state  that  the  intercept  term  a0  can  be  estimated  by 
passing  the  solution  plane  through  the  sample  means,  or  the  centroid  as  Nievergelt  [23] 
terms  it.  (For  further  details  on  this  statement  about  the  centroid  see  §  5,  especially 
Figure  5.1.) 

We  now  look  at  special  cases  of  the  LS  system  (4.40). 


4.6.1  Each  Input  Correlated  with  at  Most  One  Other  Input 

To  make  the  problem  more  tractable,  we  assume  that  all  inputs  are  correlated  with 
at  most  one  other  input.  For  example,  if  7y,7ji  +  0,  then  7ifc  =  ~jki  =  0  (for  k  ±  j)  and 
7 kj  ~  Hjk  ~  0  (for  k  ^  i),  that  is,  all  7’s  on  the  ith  row  and  columns,  and  jth  row  and 
column  of  the  matrix  G  are  zero  except  for  jij  and  7j, .  This  restriction  implies  that  we 
have  at  most  n  (for  n  even)  orn-1  (for  n  odd)  input  correlations.  Below  is  an  example 


40 


DSTO-RR-0204 


of  this  restriction  for  a  7-by-7  matrix 


0  0  0  0  0  716  0  " 

0  0  0  0  725  0  0 

0  0  0  0  0  0  737 

0  0  0  0  0  0  0 

0  752  o  0  0  0  0 

761  0  0  0  0  0  0 


(4.46) 


76i  0  0  0  0  0  0 

0  0  773  0  0  0  0  _ 

Note  that  re-arranging  the  rows  of  the  above  system  yields  a  diagonal  matrix.  We  now 
see  that  the  condition  of  “correlation  to  at  most  one  other  input”  is  satisfied  whenever  the 
normalised  covariance  matrix  is  diagonisable. 

If  we  further  assume  that  all  noise  terms  rj  have  the  same  magnitude  (that  is,  7]i  =  77), 
then  the  general  solutions  are 


ai  =  cii  1  -  r) 


(1  +  7?-^7u) 

(1  +  V  ~  7y)(l  +V  +  lij ) 


(4.47) 


ai  1  “  V 


(1  +  T?~|^) 

(!  +  »?-  7ji)(l  +  V  +  Iji) 


(4.48) 


if  7 ik  —  7ki  =  0  (for  k  ^  j )  and  7 ^  =  7 kj  =  0  (for  Notice  that  Equation  (4.47)  has 

the  same  form  as  Equation  (4.18). 

On  the  other  hand,  if  7^  =  7 ki  =  0  for  k  —  1,2, ...  ,n,  then  the  above  expressions 
simplify  to 

s'  =  iT7  <4'49> 

Using  the  example  correlation  matrix  given  by  Equation  (4.46)  and  Equations  (4.47)- 
(4.49),  the  coefficients  of  the  least  squares  (LS)  approximation  2,  are  given  by 

Si/ai  1  -  77(1  +  r)  -  ^716)/  [(1  +  V  ~  7i6)(l  +  V  +  7i6)] 

a2/a2  1  -  77(1  +  77  -  ^725)/  [(!+»?-  725) (1  +  V  +  725)] 

03/03  1  -  77(1  4-  7?  -  ^737)/  [(1  +  V  ~  737) (1  +V  +  737)] 

04/04  =  04/(1+77) 

01/05  1  -  77(1  +  77  -  ^752)/  [(1  +V~  752) (1  +  V  +  752)] 

Oi/o6  1  -  77(1  +  7?  -  fj76i)/  [(1  +  7?  -  76l)(l  +  77  +  76i)] 

01/07  1  -  77(1  +  77  -  ^773)/  [(!+»?-  773)  (1  +  V  +  773)] 

If  all  the  rji  do  not  have  the  same  magnitude,  then  the  more  general  forms  of  the 

two-dimensional  solutions  given  by  Equations  (4.16)  and  (4.17)  may  be  substituted  for 
Equations  (4.47)  and  (4.48)  respectively  (remembering  to  make  the  appropriate  change  in 
notation). 
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4.6.2  Every  Input  is  Correlated  to  All  Other  Inputs 


The  approximation  we  make  in  this  section  is  not  as  useful  as  the  results  of  the  previous 
section,  since  we  assume  that  all  correlations  are  of  the  same  magnitude  (quite  a  severe 
restriction). 

Assume  that  every  input  is  correlated  to  all  other  inputs  by  exactly  the  same  amount, 
that  is,  7 ij  =  7.  Furthermore,  assume  that  all  noise  terms  have  the  same  magnitude,  that 
is,  rji  =  t].  Under  these  two  assumptions,  Equation  (4.40)  reduces  to 


(I  +  7  Z  +  r/I)a  =  (I  +  /yZ)a , 

where  Z  is  an  n-by-n  matrix  that  is  filled  with  unity  except  for  the  diagonal  terms  which 
contain  zeroes,  that  is,  the  elements  of  Z  are  ztj  -  1  -  Sij  (where  Sit  =  1  and  %  =  0  for 
*  7^  j)-  The  least  squares  (LS)  coefficients  a-i  can  be  shown  to  have  the  solution 


Cbj 


dill- 7) 


l+r?-7(l-n  +  E"=i|) 


(1  +  i]  -  7)  [1  +  rj  +  (n  -  1)7] 


for  i  =  1,2, . . .  ,  n. 


We  began  this  section  by  generalising  the  two-dimensional  results  to  n-dimensions. 
The  results  obtained  earlier  for  the  two-dimensional  case  were  found  to  have  a  natural 
extension  to  higher  dimensions.  In  particular,  the  LS  and  the  corrected  LS  procedures 
were  seen  to  be,  respectively,  inconsistent  and  consistent.  We  also  developed  solutions  for 
special  cases  of  the  correlation  matrix. 


4.7  Effects  of  Noise  on  the  Matrix  Technique 

In  this  section  we  show  why  the  matrix  technique  out-performs  the  vector  technique 
for  noisy  systems  with  correlated  inputs. 

Consider  the  two-input  (three  strain  gauge)  system  shown  in  Figure  3.1  (on  page  10). 
We  change  the  notation  of  the  coefficient  in  this  section,  so  that  the  true  and  approximate 
stresses  are  related  to  the  external  loads  by 


True 

Approximate 

0*1  =  A\PX  +  B\Py , 

o~i  —  d\P e  -j-  b\Py  , 

(4.50) 

0“2  =  A^PX  +  B^Py, 

0"2  =  &2 Px  +  b2Py  , 

(4.51) 

00  =  AqPx  +  B0Py , 

and 

ao  =  oqPx  +  boPy  . 

(4.52) 

The  true  solutions  are  shown  on  the  left  and  the  approximate  solutions  are  shown  on  the 
right. 

Solving  for  the  true  loads  Px  and  Py  in  terms  of  the  stresses  o\  and  <72  using  Equa¬ 
tions  (4.50)  and  (4.51),  then  substituting  this  result  into  a0  in  Equation  (4.52)  gives 

00  =  Cl  0-1  +  C202, 
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where 


A0B2  -  A2B0  AqB\  -  A\Bo 

Ci  =  — — - — —  and  co  =  — — - - — — .  4.53) 

A\B2  —  A2B1  A1B2  —  A2Bl 

(Note  these  same  expressions  were  derived  earlier  using  a  different  notation  for  the  co¬ 
efficients,  see  Equations  (3.1)  and  (3.2).)  Similarly  for  the  approximate  system  we  have 
that 

<?o  =  C\a\  +  C2O2, 


where 


So  62  ~  S260 
<2162  —  #2^1 


and  S2  = 


So^i  —  Si  bo 

Si  fe2  -  S2&1 


(4.54) 


From  the  least  squares  (LS)  section,  Equations  (4.9)  and  (4.10),  we  know  the  solution 
for  a  noisy  two-input/single-output  system  is  given  by 

<7*  —  CL*PX  -f-  b^Pyj 


where 


A*  (sxxSyy  &xy  ”F  SyySxx)  A  B*SyySXy 

( suu  A  sxx)(svv  A  Syy)  ~~  sxy 


A  0(e) 


B*(SXXSyy  SXy  A  SuuSyy )  A  A^SyyS 


x:xayy  °xy  ■  °uu°yy )  \  *ri-*'->uu0xy 
(Syu  A  Sxx)(sVy  A  Syy)  ^xy 


A  0(e), 


Introducing  the  notation 


51  —  $xxsyy  sXy  “F  SyVSxx,  —  SyVSxy,  S3  —  (suu  "F  Sxx)(svv  +  Syy)  SXy> 

^4  ”  SxxSyy  &Xy  A  SuuSyy ->  and  55  —  SuuSxyi 

we  can  then  express  the  coefficients  S*  and  b *  more  concisely  as 

^  A*S\  +  B*S2  v  ,  T  B*S4  +  ^4*^5  .  . 

a*  = - h  0(6)  and  6*  = - b  0(6), 

S3  s  3 

Substituting  the  above  expressions  into  Equation  (4.54)  gives  the  numerator  and  de¬ 
nominator  of  ci  respectively  as 

a0b2  -  a2b0  =  ( A0B2  -  A2B())SlH  ~S2S(>  +  0(e) 

S3 


Si  62  —  S2&I  =  (A1B2  —  A2B1)  2  +  0(6). 


Hence  the  coefficient  Si  has  the  form 


A0B2  -  A2B0  f  A 
C1  =  MB2-A2B,+°{e)' 
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which  reduces  to  the  form 

Ci  =  ci  +  0(e ) 

if  Equation  (4.53)  is  used.  Similarly  the  coefficient  C2  has  the  form  62  =  C2  +  0(e ).  In 
other  words,  the  matrix  technique  is  not  noise  sensitive  (to  order  e)  when  the  inputs 
are  correlated.  We  have  eliminated  the  troublesome  denominator  S3  using  additional 
information. 

The  above  analysis  explains  the  error  plots  shown  in  Figure  3.7.  As  can  be  seen  the 
vector  technique  is  an  inconsistent  estimator  of  the  true  solution,  whereas  the  matrix 
technique  is  a  consistent  estimator.  Furthermore,  the  slope  of  the  approximation  order 
versus  the  normalised  error  is  -1/2  as  predicted  from  the  order  e  analysis  in  §  4.2  (see 
Equation  (4.11)). 

In  the  above  analysis  we  have  implicitly  assumed  that  the  denominator  of  a*  and  is 
non-zero,  that  is  S3  =  (sim  +  sxx)(svv  +  syy)  —  sxy  ^  0.  This  assumption  is  true  for  any 
real  system,  since  suu ,  svv  >  0,  that  is,  both  inputs  have  noise.  However,  53  ^  0  is  satisfied 
even  for  ideal  noiseless  systems  (sUM  =  svv  =  0)  provided  the  two  inputs  are  not  totally 
correlated,  that  is,  provided  Px  ^  Py  for  at  least  one  point. 

In  this  section  the  superior  performance  of  the  matrix  technique  was  shown  to  be  due 
to  the  elimination  of  the  troublesome  denominator  (a  function  of  correlation),  making  the 
matrix  technique  consistent. 


4.8  Relation  between  Condition  Number  and 
Output  Correlation 

In  this  section  we  investigate  the  relationship  between  the  condition  number  of  the 
underlying  linear  system  and  the  correlation  of  the  measured  stresses  (or  outputs).  We  will 
see  that  an  ill-conditioned  system  does  not  necessarily  imply  output  correlation.  Neither 
is  the  reverse  statement  true,  that  is,  measured  stress  (or  output)  correlation  does  not 
necessarily  imply  an  ill-conditioned  system. 

We  begin  by  determining  the  condition  number  of  the  simple  five  member  truss  we 
investigated  in  §  3.4.  We  numerically  simulated  the  measurement  of  stress  of  that  truss, 
and  using  these  “measured”  stresses  (true  stress  plus  noise)  in  members  OA  and  AB  we 
predicted  the  stress  in  member  OC.  From  Equations  (3.3)  and  (3.5)  we  have  that 


a  =  Ap 

where  a  =  [<rOA,crAB]T,  p  =  [PX:  Py}T\  and 


X1-*) 

~T( 

i 
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(\  -  2\ 
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«21  ^22 

Lv  e) 

0)1 

(4.55) 


(4.56) 


Using  the  results  from  §  2.3  we  can  determine  the  condition  number  of  the  above  system. 
We  know  that  for  an  infinite  condition  number  (of  a  two-by-two  system)  we  require  that 
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|A|  =  0,  which  yields  the  two  conditions  a  =  7  and  /?  =  7.  These  two  conditions  were 
shown  as  special  cases  of  the  truss  (see  Figure  3.5,  which  shows  either  member  AC  or  BC 
as  vertical). 

Let’s  now  consider  the  correlation  between  the  two  outputs  (aOA  and  aAD).  From 
Equation  (4.15)  the  correlation  coefficient  between  two  variables  x  and  y  is  defined  as 
r(x,y)  =  sxyl y/sxxsyy,  where  sxy  =  (1/n)  Y17=i  x iVi  covariance  between  x  and  y  for 

n  sample  points.  For  conciseness  define  the  following  notation:  the  covariance  between 
external  loads  sxx  =  co v(Px,Px),  syy  =  cov(Py,  P^),  and  sxy  —  co v(Px,Py);  the  covari¬ 
ance  between  stresses  sn  =  cov(aOAlaOA)1  S22  =  cov(cr^jg,  aAB),  and  s\ 2  =  cov (a OAl  crAB); 
and  the  correlation  between  the  two  outputs  ra  —  r(crrM,  (Jab)-  Using  the  definition  of 
covariance  we  then  have  that 

1  n 

$11  =  —  T  iflllPx  +  al2 Py)2 
n 

=  (X^Sxx  +  2,Qji\CLi2Sxy  T"  U-12 Syy,  (4.57) 

^22  =  aiisxx  +  2a2ia22  sxy  +  a^Syy,  and  (4.58) 

£>12  ~  Q'llOj2lSxx  +  (^11^22  "F  ttl2^2l)  $xy  +  ^12^22  Syy,  (4.59) 

where  an,  ai2,  a2i,  and  a22  are  the  coefficients  of  the  matrix  A  given  in  Equation  (4.56). 
We  may  then  write  down  the  correlation  coefficient  between  the  two  outputs  aOA  and  aAB 
as 


2  [S;r:r  (^  +  rf)&xy  +  ^TSyy]  T  [s##  ^OtSxy  T  Ol  syy  ^ 

a  (sxx  -  2 asxy  +  a1  Syy)  {(/?  -  7)  [(/?  -  7 )sxx  +  27(0;  -  (3)sxy\  +  (a  -  (3)2j2syy}  * 

(4.60) 

We  want  to  determine  when  the  two  outputs  aOA  and  aAB  are  correlated  (that  is, 
ra  =  1).  Using  the  above  equation,  r2  =  1  is  satisfied  whenever  the  condition 

/?2(a  -  7 )2(4y  -  SXXSyy )  =  0 

is  met.  Thus  there  are  three  ways  the  outputs  a OA  and  a AB  can  be  completely  correlated; 
if  (3  =  0  or  a  =  7  (two  of  the  three  conditions  for  ill-conditioning),  or  if  the  two  forcing 
loads  Px  and  Py  are  completely  correlated  (that  is,  s2y  —  sxxsyy  =  0).  However,  f3  =  0 
cannot  be  satisfied  under  the  conditions  imposed  earlier  (|/3|  >  0,  see  page  15). 

We  now  see  that  ill-conditioning  does  not  imply  output  correlation  for  the  truss  system 
developed  in  §  3.3.  For  example,  if  /3  =  7  then  k  =  00  but  rG  ^  1.  Furthermore,  output 
correlation  does  not  imply  system  ill-conditioning,  for  example,  if  Px  oc  Py  then  ra  =  1  but 
k  ^  00.  These  results,  however,  may  be  due  to  the  fact  that  the  truss  system  developed 
in  §  3.3  is  (coincidentally)  a  special  case,  as  we  now  show. 

Let’s  consider  the  same  system  given  by  Equation  (4.55),  except  the  matrix  A  is  now 
given  by  a  general  two-by-two  matrix.  By  definition  the  square  of  the  correlation  coefficient 
is  r2  =  s22/(suS22),  using  Sn,  s22,  and  512  as  defined  by  Equations  (4.57)-(4.59)  we  obtain 

^2  __  [alla2lSxx  +  (^11^22  +  &12a2l)  Sxy  +  ^12^22Syy]2 

{p\\sxx  T  2anai2 sxy  T  ( p2l$xx  "F  2a2ia22Sxy  T  ^22Syy) 
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Solving  for  the  case  when  the  two  outputs  crOA  and  aAD  are  completely  correlated  (that  is, 
r\  —  1)  gives  the  condition 

(ana22  -  ai2a2i)2  (s2xy  ~  SxxSyy)  =  0 

l-^-l  ( sxy  ~  sxxSyy)  ~  0* 

Thus  it  appears  as  if  the  truss  system  we  developed  in  §  3.3  is  a  special  case,  and  in  general 
ill-conditioning  implies  correlation,  but  not  the  reverse.  (The  other  possibility  is  that  all 
trusses  lead  to  the  special  case  shown  above.  This  idea  was  not  investigated  further.) 

The  above  analysis  was  undertaken  for  the  limiting  case  of  completely  correlated  mea¬ 
sured  outputs  (t>  =  1).  We  need  to  ask  the  same  questions  of  partially  correlated  outputs. 

4.8.1  Partial  Correlation:  Simulation  of  Random  Truss  Geometries 

Analytically  solving  for  conditions  such  as  when  k  approaches  infinity  and  rG  nears 
unity  would  result  in  a  complex  set  of  conditions.  Instead  a  simulation  was  conducted 
to  determine  the  effects  of  configuration  (of  the  five-member  truss  developed  in  §  3.3)  on 
both  the  output  correlation  and  the  condition  number. 

Before  we  look  at  the  relation  between  output  correlation  and  condition  number,  let’s 
transform  the  range  of  the  condition  number,  k  €  [1,  oo),  to  that  of  output  correlation, 
tv  G  [0, 1].  One  transformation  that  will  achieve  this  mapping  is  given  by 

*'  =  1  ~  ^2  >  (4.61) 

which  we  term  the  transformed  condition  number.  (The  only  reason  k  squared  was  used 
in  the  above  equation  was  to  eliminate  the  square  root  in  the  expression  for  condition 
number  given  by  Equation  (2.18).) 

We  begin  by  evaluating  the  form  of  the  transformed  condition  number  for  the  truss. 
From  Equations  (2.18)  and  (4.56)  we  have  that  the  condition  number  for  the  five-member 
truss  system  developed  in  §  3.3  is  given  by 

*  =  \/1  +  d ~i’  (4-62) 

where 

-2  = _ {2(q2  +  1)72  +  /?2(a2  +  72  +  2)  -  20y  [q(q  +  7)  +  2]}2 _ 

4(0  -  7)2  K0  -  27)  +  /?7]2  +  {2(q2  -  1)72  +  02(q2  +  72  -  2)  -  2/?7  [q(q  +  7)  -  2]}2 ' 

Using  Equations  (4.61)  and  (4.62)  we  then  have  that  the  transformed  condition  number 
is  given  by 


We  now  derive  the  form  of  the  output  correlation.  If  we  assume  the  two  inputs  (Px 
and  Py)  are  completely  uncorrelated  (that  is,  sxy  =  0),  then  Equation  (4.60)  simplifies  to 

r2  _  [(0  -  7)s'  -  q(q  -  0b]2 

a  sxy-o  (s'  +  a1)  [(0  -  7) 2 s'  +  (q  -  0)272]  ’ 
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where  s'  is  the  ratio  of  the  two  input  covariances,  that  is,  s'  =  sxx/syy. 

Now  that  we  have  both  the  transformed  condition  number  and  the  output  correlation 
we  can  compare  how  these  two  quantities  are  related  using  a  simulation  of  random  truss 
geometry. 

For  our  simulation  the  truss’  configuration  was  varied  by  randomly  choosing  values  for 
the  parameters  a,  /?,  and  7  (these  parameters  uniquely  define  the  truss’  non-dimensional 
geometry,  see  §  3.3).  The  parameters  a,  /?,  and  7  were  randomly  selected  using  a  uniform 
distribution  with  the  following  ranges,  7  G  [e,  1],  (3  G  [e,  1  —  e],  and  a  G  [e,  /?],  with 
e  set  to  e  =  0.01  (an  arbitrary  choice  of  a  small  number).  Using  these  random  truss 
parameters  10000  condition-correlation  coordinate  pairs  («/,  ra)  were  generated,  we  term 
these  coordinate  pairs  “points” .  Finally,  in  the  evaluation  of  the  output  correlation  ra  we 
have  selected  the  input  forces  Px  and  Py  to  be  completely  uncorrelated  (that  is,  sxy  —  0), 
and  the  ratio  of  the  input  variances  to  be  unity  (that  is,  s'  =  sxx/syy  =  1). 


count 


(a)  (b) 

Figure  f.5:  Bin  counts  of  10000  coordinate  pairs  («/,7v)  determined 
using  random  values  of  a,  (3,  and  7. 


Figure  4.5  shows  the  result  of  bin  counts  on  this  10  000  point  set.  Both  plots  shows  how 
many  of  the  10000  points  occurred  in  any  one  rf~rG  bin,  and  the  shading  represents  the 
number  of  points  in  any  one  bin  (the  darker  the  shade  the  more  points).  The  transformed 
condition  number  k'  and  output  correlation  ra  ranges  were  partitioned  into  20  equal  inter¬ 
vals  for  the  three-dimensional  plot  and  40  equal  intervals  for  the  two-dimensional  plot.  Due 
to  this  difference  in  partitioning,  the  shading  in  the  three-  and  two-  dimensional  plots  rep¬ 
resent  different  point  counts.  Finally,  note  that  the  vertical  scale  on  the  three-dimensional 
plot  and  the  shading  in  both  plots  is  logarithmic. 

We  see  from  Figure  4.5  that  the  majority  of  geometric  configurations  result  in  systems 
that  tend  to  be  ill-conditioned  (k!  ~  1)  and  result  in  correlated  outputs  (ra  ~  1).  This 
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result  may  be  due  to  the  fact  that  the  vast  majority  of  random  geometric  configurations 
would  have  a  skewed  appearance,  and  hence  we  would  expect  the  resulting  truss  system 
to  be  ill-conditioned.  If  the  range  of  the  geometric  parameters  (a.  j3,  and  7)  are  changed, 
then  the  resulting  bin  count  differs  to  that  shown  in  Figure  4.5.  However,  several  features 
remain  constant  even  when  the  geometric  parameter  ranges  are  changed. 

•  The  correlation  between  the  two  outputs,  ra,  appears  to  always  be  less  than  the 
transformed  condition  number  k'  .  (There  is  no  shading  above  the  k'  =  ra  line.) 

•  Most  geometric  configurations  have  a  higher  probability  of  tending  towards  ill- 
conditioning  and  high  output-correlation  than  any  other  n'-rc  combination.  (The 
dark  shading  is  concentrated  around  the  top  righthand  corner.) 

•  The  degree  of  ill-conditioning  k'  and  output  correlation  ra  appear  to  be  highly  cor¬ 
related.  (Notice  the  black  shading  near  the  k'  =  ra  line.) 

Figure  4.5  also  shows  us  that  even  if  the  truss  geometry  is  ill-conditioned  (that  is, 
k  ~  1),  this  doesn’t  necessarily  imply  that  the  outputs  will  be  correlated  (that  is,  it  is 
possible  to  obtain  rCT  <  1).  This  observation  is  highlighted  by  the  fact  that  the  shading 
for  k'  k,  1  occurs  around  both  ra  small  and  ra  «  1.  Additionally,  these  preliminary 
simulations  suggest  that  the  output  measurements  will  never  be  more  correlated  than  the 
ill-conditioning  of  the  underlying  truss  configuration  (that  is,  ra  <  n'). 

In  this  section  we  determined  both  the  correlation  and  condition  number  of  the  five 
member  truss  used  for  simulations.  We  showed  that  correlated  input  signals  do  not  imply 
an  ill-conditioned  system,  and  conversely,  that  an  ill-conditioned  system  does  not  imply 
correlated  input  signals.  Due  to  the  complexity  in  developing  conditions  for  near  unit 
correlation  and  near  ill-conditioning,  we  resorted  to  simulations.  As  expected,  we  found 
(using  a  simulation  of  thousands  of  random  truss  geometries)  that  correlation  and  condi¬ 
tion  number  were  loosely  related.  Additionally,  and  unexpectedly,  the  simulations  suggest 
that  the  input  signal  correlation  is  always  better  than  the  ill-conditioning  of  the  system, 
which  is  good  news. 


4.9  Simulation  Results  of  the  Corrected- Vector  Technique 

In  this  subsection  we  implement  the  result  of  this  section,  namely  the  corrected  least 
squares  (CLS)  procedure  applied  to  the  ordinary- vector  technique.  (In  this  subsection  we 
rename  the  “vector”  technique  from  previous  sections  as  the  “ordinary-vector”  technique 
to  clearly  distinguish  it  from  the  corrected-vector  technique.) 

A  simulation  of  random  load  inputs  into  the  five  member  truss  (developed  in  §  3.3)  was 
undertaken.  The  simulation  was  the  same  as  that  carried  out  earlier  in  §  3.4.  Different 
approximation  orders  (the  number  of  points  in  the  LS  solution)  were  tested  for  accuracy, 
and  within  each  approximation  order  1000  solution  batches  were  obtained.  From  these 
1000  batch  sets,  the  distribution  of  errors  could  be  ascertained  by  determining  the  outliers 
and  quartiles  of  the  sets.  (See  §  3.4  for  further  details.) 


48 


DSTO-RR-0204 


The  median  solution  in  this  subsection  was  calculated  using  the  L\  (or  spatial)  median, 
where  the  distance  from  the  median  to  the  1000  solutions  was  minimised.  (For  more  details 
on  the  L\  median  see  Small  [31]  or  for  a  brief  account  Martin  [21].)  In  contrast,  in  §  3.4 
the  median  solution  was  calculated  as  the  median  of  each  component.  Let  x  denote  an 
n-dimensional  vector  having  components  x  =  (aq, . . . ,  xn)  and  x\  be  one  of  the  m  data 
points  used  to  determine  the  median.  Then  the  L\  median  and  component  median  are 
defined  respectively  as 

m 

minV^  II*  -  ®j||2,  and  (xn,  xi2,  ■  ■ .  ,xin) , 

X 

i= 1 

where  Xij  is  the  j th  component  of  the  ith  data  point  and  =  median^x^),  for  example, 
is  the  median  of  the  data’s  second  component  (over  all  the  data  points  i  =  1, . . . ,  m). 

Figures  4.6  illustrates  sample  solutions  for  the  or  dinary- vector,  matrix,  and  corrected- 
vector  technique.  These  solutions  are  plotted  as  the  a\  and  a 2  coefficients  of  Equation  (3.8). 
Within  each  of  these  plots  the  effect  of  approximation  order  (the  number  of  points  used 
to  develop  the  solution)  is  depicted.  Six  solutions  were  obtained  for  each  approximation 
order  of  2,  8,  32,  128,  or  512  points  with  associated  star,  square,  circle,  triangle,  and  cross 
symbols  respectively. 

The  most  striking  feature  of  these  illustrations  is  that  the  solutions  appear  to  lie  on 
(or  close  to)  a  straight  line.  A  brief  experimentation  with  system  condition  number  seems 
to  suggest  that  this  phenomenon  is  a  product  of  an  ill-conditioned  system.  (Perhaps  this 
straight  line  aligns  with  the  null-space  of  the  system?) 

Due  to  the  large  scale  changes,  the  three  plots  shown  in  Figure  4.6  have  been  plotted 
on  a  log-like  scale.  The  spatial  median  of  the  highest  order  approximation  (512  points) 
for  each  technique  was  set  as  the  origin  of  the  inverse  hyperbolic  sine  function  (sinh"1). 
Mathematically,  if  x  represents  the  original  space,  then  the  transformed  space  x'  is  given 
by 

xf  =  sinh-1  (x  —  Xi) , 

where  x%  =  median*  (aq)  is  the  spatial  median  of  the  data  points  aq.  This  hyperbolic 
mapping  maintains  a  near-linear  spacing  around  the  origin  (the  median  in  our  case)  and 
produces  a  log-like  expansion  away  from  the  origin.  Unfortunately,  compared  to  the  linear 
plots  these  log-like  plots  de-emphasise  the  difference  in  solution  spread  as  the  approxi¬ 
mation  order  changes.  However,  this  hyperbolic  transformation  is  required  to  show  the 
different  solutions,  with  a  reasonable  spread,  on  the  one  plot. 

The  centre  of  the  horizontal  and  vertical  straight  lines  (the  cross  hair)  denotes  the 
location  of  the  exact  solution  (given  by  the  coefficients  of  Equation  (3.9)). 

We  again  see  the  characteristics  described  earlier  (in  §  3.4,  especially  Figure  3.7)  for  the 
ordinary-vector  and  matrix  techniques.  The  spread  of  ordinary-vector  solutions  is  small, 
and  decreases  with  increasing  approximation  order.  However,  the  or  dinary- vector  solution 
moves  away  from  the  exact  solution  as  the  approximation  order  increases.  In  contrast,  the 
matrix  technique  has  a  larger  spread,  but  its  solutions  contract  around  the  exact  solution 
as  the  approximation  order  increases. 

The  corrected- vector  technique  is  similar  to  the  matrix  technique  in  that  its  solutions 
have  a  large  spread  and  they  contract  around  the  exact  solution  as  the  approximation  order 
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Figure  4.6:  Sample  solutions  (a}  and  a2  coefficients)  for  different  ap¬ 
proximation  orders  (number  of  points  used  to  develop  the  solution).  The 
cross-hairs  denote  the  location  of  the  exact  solution. 
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increases.  The  corrected-vector  technique,  however,  has  a  far  larger  spread  (by  two  orders 
of  magnitude)  than  the  matrix  technique.  Additionally,  the  corrected-vector  technique 
only  begins  to  contract  around  the  exact  solution  once  the  approximation  order  is  large. 
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Figure  f.l:  Comparison  of  the  normalised  error  of  the  corrected-vector 
technique  (shaded  planes)  to  the  ordinary-vector  technique  (circles)  and 
matrix  technique  (triangles).  The  distribution  of  errors  is  illustrated 
by  the  error  outliers  (denoted  as  lower  and  upper),  the  first  and  third 
quartiles  (Q\  and  Qz),  and  the  median. 


The  distribution  of  the  normalised  error  for  the  corrected-vector  technique  is  plotted 
in  Figure  4.7  (for  a  comparison  see  Figure  3.7  on  page  18).  For  each  of  the  approximation 
orders  (2,  4,  8, . . . ,  1024  points)  we  plot  the  smallest  outlier  (denote  as  “lower”),  the  first 
quartile  (Qi),  the  median,  the  third  quartile  (Q3),  and  the  largest  outlier  (denoted  as 
“upper”),  which  allows  us  to  determine  the  distribution  of  errors. 

The  distribution  of  errors  for  the  corrected-vector  technique  is  shown  as  shaded  planes. 
In  contrast,  the  distribution  of  errors  for  the  ordinary-vector  technique  and  matrix  tech¬ 
nique  use  box-plots.  In  these  box-plots  the  symbols  (either  circles  or  triangles  for  the 
ordinary-vector  and  matrix  techniques  respectively)  are  placed  at  the  outliers  and  the 
median.  The  first  and  third  quartiles  are  located  at  the  lower  and  upper  extremes  of  the 
box. 

As  can  be  seen  in  Figure  4.7  the  corrected-vector  technique  is  not  as  accurate  as  the 
matrix  technique.  In  fact,  for  low  order  approximations  the  or  dinary- vector  technique 
also  produces  smaller  errors  than  the  corrected-vector  technique.  In  general  we  see  that 
for  high  order  approximations  the  errors  of  the  corrected-vector  technique  are  roughly  an 
order  of  magnitude  larger  than  the  matrix  technique.  Finally,  note  the  large  spread  of 
errors  exhibited  by  the  corrected-vector  technique 
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Legend 

Q - a  vector 
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. -  matrix 
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Figure  4.8:  Comparison  of  the  normalised  error  of  the  median  solution 
(dashed  lines)  for  the  corrected- vector  ( triangles ),  ordinary-vector  (cir¬ 
cles),  and  matrix  (squares)  techniques.  For  comparison  the  median  of 
the  normalised  errors  (solid  lines)  are  also  shown. 


Figure  4.8  shows  how  both  the  median  of  the  errors  and  the  error  of  the  median  solution 
vary  with  approximation  order  (see  page  19  for  a  discussion  on  the  difference  between 
these  two  errors).  We  see  that  both  the  corrected- vector  and  the  matrix  techniques’  error 
improves  by  taking  the  median  of  the  1000  solutions  evaluated  for  each  approximation 
order.  For  these  two  techniques  the  error  of  the  median  improves  by  an  order  of  magnitude 
(approximately)  when  compared  to  the  median  of  the  errors. 

We  have  seen  that  although  the  corrected-vector  technique  is  more  accurate  than  the 
ordinary- vector  technique  (for  large  approximation  orders),  the  accuracy  is  still  inferior  to 
that  of  the  matrix  technique.  This  inferior  accuracy  is  most  likely  due  to  the  sensitivity 
of  the  correction  to  errors  in  noise  estimations  (as  outlined  earlier). 


4.10  Simulation  Results  of  the  Surrogate-Matrix  Technique 

Although  the  matrix  technique  yields  good  results,  it  requires  the  use  of  external  loads 
during  calibration.  We  saw  earlier  in  §  4.7  that  the  reason  the  matrix  technique  produced 
good  results  was  that  the  troublesome  denominator  was  eliminated  using  external  loads.  A 
question  then  naturally  arises:  From  the  perspective  of  the  linear  system  what  is  so  special 
about  the  external  loads?  The  simple  answer  is  nothing.  Why  then  don’t  we  use  additional 
information  from  elsewhere  in  the  linear  system  (for  example,  extra  gauges  instead  of 
external  loads)  to  eliminate  the  troublesome  denominator.  In  the  current  subsection  this 
is  the  procedure  we  investigate,  which  we  term  the  “surrogate-matrix”  technique.  This 
surrogate”  prefix  seemed  appropriate  since  the  extra  gauges  are  acting  as  a  surrogate 
for  the  external  loads.  For  brevity  we  will  sometimes  simply  refer  to  gauge  aAC,  for 
example,  which  should  be  read  as  the  gauge  located  on  beam  member  AC  that  measures 
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the  stress  aAC .  From  now  on  the  “matrix”  technique  is  sometimes  referred  to  as  the 
“ordinary-matrix”  technique  to  remind  the  reader  that  it  is  different  to  the  surrogate- 
matrix  technique. 

Before  carrying  out  the  simulations  let’s,  first  reflect  on  possible  outcomes.  In  the 
simple  truss  that  we’ve  chosen  to  simulate  (see  Figure  3.4),  we  decided  to  model  what 
appeared  to  be  the  most  ill-conditioned  system  for  the  chosen  configuration  (refer  to  §  3.4 
for  a  more  detailed  discussion).  We  can  have  up  to  five  strain  gauges  yielding  different 
results  on  this  truss,  but  three  of  those  gauges  are  already  being  used  in  our  investigation 
(remember  we  have  simulated  the  relation  a()C  =  f(crOA,<rAB)).  Hence  there  are  only  two 
gauges  available  ( aAC  and  <rBC)  to  replace  the  two  external  loads  (Px  and  Py) — which,  by 
the  way,  is  fortunate!  If  we  use  these  two  gauges,  however,  we  run  the  risk  of  somehow 
improving  the  overall  condition  number  of  the  modified  system.  This  improved  state 
follows  since  if  we  had  already  chosen  the  worst  possible  gauge  configuration,  then  any 
surrogate  gauges  can  only  improve  the  situation.  This  whole  argument  would  imply  that 
any  accurate  solutions  may  be  simply  due  to  the  fact  that  the  modified  system  is  well- 
conditioned  (as  compared  to  the  ill-conditioned  system  used  for  simulations  thus  far). 

We  need  to  differentiate  whether  good  results  are;  (i)  due  to  (the  possibility  of)  im¬ 
proved  condition  number  (resulting  from  the  surrogate  good  gauges);  or  (ii)  due  to  the 
improved  technique  (namely  the  surrogate-matrix  technique).  The  simulations  were  ad¬ 
ditionally  carried  out  using  the  “good”  gauges  in  order  to  reduce  the  effect  of  improving 
the  condition  number.  In  other  words,  we  solved  for  the  linear  system  croc  =  f(aAC^  crBc) 
(good  gauges)  using  the  ordinary-matrix  technique,  with  the  surrogate  “bad”  gauges  ( aAC 
and  aBC )  used  for  the  surrogate-matrix  technique.  We  use  the  terms  “good”  and  “bad” 
gauges  (or  system)  to  refer  to  the  condition  number  of  the  resulting  system  (of  linear  equa¬ 
tions)  determined  using  that  configuration  of  gauges.  These  terms  are  relative,  so  that 
gauge  configurations  were  judged  to  be  good  or  bad  depending  on  structural  knowledge 
and  simulation  results.  (For  further  details  on  gauge  selection  see  §  3.4.) 

Figure  4.9  shows  how  the  surrogate-matrix  technique  performs.  Both  plots  show  nor¬ 
malised  error  versus  approximation  order  (number  of  measurement  points  used  to  develop 
the  solution).  The  top  plot  shows  the  bad  system, 

ooc  =  /  (ctoa,  (Tab]  [Px,  Py]  or  [aAC,  crBC ]) ,  (4.63) 

which  in  the  surrogate-matrix  solution  replaces  external  loads  (Px  and  Py)  by  the  good 
gauges  (aAC  and  aBC).  Similarly,  the  bottom  plot  shows  the  good  system, 

<?OC  =  f  i&ACi  &BC\  [Fx>  Py]  or  [crA„,aAB]),  (4.64) 

which  in  the  surrogate-matrix  solution  replaces  external  loads  (Px  and  Py)  by  the  bad 
gauges  (<joa  and  aAB ).  These  results  add  weight  to  the  hypothesis  that  the  surrogate- 
matrix  technique  performs  just  as  well  as  the  ordinary-matrix  technique,  even  when  the 
surrogate  information  is  more  ill-conditioned  than  the  original  system.  If,  on  the  contrary, 
the  good  results  in  the  top  plot  were  due  to  the  improved  condition  number  of  the  system, 
then  the  normalised  error  should  be  closer  to  that  of  the  bottom  plot,  which  it  is  not. 

From  Figure  4.9  we  see  that  our  suspicions  about  the  ill-conditioning  of  the  good 
(Equation  (4.63))  and  bad  (Equation  (4.64))  systems  are  confirmed.  The  normalised  error 
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Figure  4-9:  Comparison  of  the  normalised  error  of  the  median  solu¬ 
tion  for  the  corrected-vector  (squares),  ordinary-matrix  (triangles),  and 
surrogate-matrix  (circles)  techniques.  The  top  plot  shows  the  “bad”  sys¬ 
tem,  the  system  used  for  all  simulations  thus  far.  The  bottom  plot  is  the 
“good”  system. 


of  the  bad  system  (top  plot)  is,  on  the  whole,  worse  than  the  normalised  error  of  the  good 
system  (bottom  plot).  For  our  chosen  truss  configuration  (< a  =  1.000,  (3  =  2.000,  and 
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7  =  1.100,  see  page  16),  the  use  of  Equations  (2.18)  and  (4.56)  allows  us  to  compute  the 
bad  system’s  condition  number  as  n  «  20.17.  A  similar  computation  yields  the  condition 
number  of  the  good  system  as  k  ~  2.419.  For  the  chosen  truss  configuration  (at  the  very 
least),  these  condition  numbers  confirm  our  suspicions  about  the  relative  good  and  bad 
assessment  of  the  gauge  configurations. 

As  a  final  point  note  that  due  to  the  improved  condition  number  of  the  good  sys¬ 
tem,  the  solutions  of  the  ordinary-matrix  and  surrogate-matrix  techniques  weren’t  the 
only  techniques  to  improve.  All  the  vector-based  techniques  also  improved,  including  the 
ordinary-vector  technique.  This  emphasises  the  fact  that  for  well-conditioned  systems 
with  small  amounts  of  noise,  there  are  only  marginal  gains  to  be  made  from  using  the 
matrix-based  techniques  over  the  vector-based  techniques. 

The  surrogate-matrix  technique  performed  as  well  as  was  theoretically  predicted.  Using 
additional  truss  gauges  instead  of  external  loads,  the  troublesome  denominator  found  in 
the  vector  technique  was  successfully  eliminated.  We  have  simulation  evidence  to  suggest 
that  the  similar  accuracy  between  the  ordinary- matrix  and  surrogate-matrix  techniques  is 
due  to  the  underlying  methods  and  not  due  to  the  improvement  of  the  system’s  condition 
number.  Of  all  the  techniques  sampled  thus  far,  the  surrogate-matrix  has  the  highest 
accuracy,  but  requires  additional  gauges  as  compared  to  the  vector-based  techniques.  We 
also  noted  that  the  vector-based  techniques  perform  with  comparable  accuracy  to  the 
matrix-based  techniques  for  well  behaved  systems. 


4.11  Jack-Knife  Correction  and  Bootstrap 

We  learned  of  the  jack-knife  correction  (JKC)  at  the  end  of  §  4.2  (see  page  25),  which 
improved  the  order  of  the  approximating  technique.  Due  to  the  vector  technique’s  incon¬ 
sistency,  the  JKC  cannot  be  used  on  it.  All  the  remaining  techniques  have  the  potential  to 
be  improved  by  the  JKC;  however,  the  variance  of  the  solutions  (see  Figures  3.7  and  4.7) 
renders  the  JKC  given  by  Equation  (4.12)  unusable.  This  follows  since  the  large  variance 
of  solutions  (for  low  order  approximations)  means  that  tm  in  Equation  (4.12)  has  a  better 
than  average  chance  of  being  far  from  the  exact  solution. 

The  JKC  given  by  Equation  (4.13)  (the  second  order  correction)  doesn’t  suffer  from 
this  potentially  bad  initial  estimate,  and  hence  we  suspect  that  it  would  provide  a  better 
correction  than  Equation  (4.12)  (the  first  order  correction).  The  amount  of  computational 
effort  required  by  the  first  and  second  order  corrections  (Equations  (4.12)  and  (4.13) 
respectively)  goes  up  linearly  and  quadratically,  respectively,  as  the  number  of  points  used 
in  the  approximation. 

To  avoid  the  quadratic  penalty  of  Equation  (4.13),  perhaps  the  re- formulation  of  Equa¬ 
tion  (4.12)  as 

tm  =  1)  ^m— 1  —  {pl  2)  tm— 2 

might  provide  a  reasonable  first  order  correction?  In  the  above  equation  tm_ 2  should  be 
computed  using  the  original  data  set  deleting  two  points  at  a  time,  but  only  applying  this 
deletion  0(m)  times  (since  it  is  a  first  order  correction).  No  analysis  (neither  experimental 
nor  theoretical)  of  this  suggested  re-formulation  was  undertaken,  and  as  such  should  not 
be  used  until  verified. 
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The  JKC  uses  the  mean  in  its  definition,  see  Equations  (4.12)  and  (4.13).  As  already 
discussed  this  is  not  an  appropriate  measure  for  the  multi- dimensional  data  at  hand,  and 
perhaps  a  good  substitution  would  be  the  spatial-median. 

Given  that  for  each  solution  we  are  required  to  solve  a  least  squares  (LS)  system  (nor¬ 
mally  through  the  use  of  singular  value  decomposition),  the  JKC  becomes  computationally 
burdensome.  However,  since  the  development  of  the  transfer  function  is  essentially  a  once- 
off  calibration,  the  large  computational  effort  required  may  be  warranted.  Overall  the  JKC 
is  recommended  only  if  there  is  a  lack  of  data. 

In  an  earlier  report  [25]  it  was  thought  that  the  bootstrap  method  might  improve  the 
vector  technique’s  results.  We  now  realise  that  the  errors  in  the  vector  technique  were  due 
to  the  technique  s  inconsistent  nature,  and  hence  the  re-sampling  offered  by  the  bootstrap 
technique  (see  Press  et  al  [26]  for  example)  would  not  eliminate  these  errors. 


4.12  Two  Input  Surrogate  Matrix  Technique 
versus  Four  Input  Redundant  LS 

In  this  section  we  determine  if  there  is  any  advantage  in  using  additional  redundant 
information  in  the  ordinary  LS  technique.  We  compare  the  surrogate  matrix  technique 
(with  two  gauges)  to  the  redundant  least  squares  (LS)  technique  (with  four  gauges).  More 
specifically,  the  two  input  surrogate  matrix  technique  has  either  one  of  the  forms 

(T()C  =  A\<joa  -f  A^Oab  or  g()C  =  B\oac  +  Bz&bc  (4.65) 

depending  on  whether  the  good  system  or  bad  system  respectively  is  being  modelled, 
where  A2,  B\,  and  B<i  are  constants.  The  four  gauge  redundant  LS  approximation 
has  the  form 

uoc  =  C\aOA  +  ab  +  CscrAC  -f  Cacfbc,  (4.66) 

where  Ci,  C2,  C3,  and  C4  are  constants.  Notice  that  measurements  from  all  four  gauges 
are  being  used  to  predict  the  stress  at  gauge  0A,  despite  the  fact  that  only  two  gauges  are 
needed  to  completely  specify  the  system  (see  Equation  (3.9)  for  example).  We  will  show 
(for  the  one-dimensional  case)  that  like  the  ordinary  LS  (that  is,  the  vector)  technique  we 
have  already  investigated,  the  redundant  LS  technique  is  also  inconsistent. 

Let’s  begin  with  a  simple  one-dimensional  linear  system  whose  solution  is  given  by 

z  =  cx,  (4.67) 

where  z  is  completely  specified  by  the  constant  c  and  variable  x.  Introduce  a  second 
variable  y ,  which  is  a  linear  function  of  x,  that  is, 

V  =  Ox,  (4.68) 

where  6  is  a  constant.  We  can  now  write  down  z,  given  x  and  y,  as 

z  =  ax  +  by ,  (4.69) 

where  both  a  and  b  are  constants.  The  constant  9  is  a  function  of  the  three  constants  a, 
b:  and  c,  and  from  the  above  three  equations  we  have  that  8  =  ( c-a)/b .  Equation  (4.69) 
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represents  the  one-dimensional  equivalent  of  Equation  (4.66);  while  Equation  (4.67)  is  the 
one-dimensional  ordinary  LS  solution  of  Equation  (4.65).  (Note  that  Equation  (4.67)  is 
the  ordinary  LS  and  not  the  surrogate  matrix  technique.) 

We  want  to  determine  if  there  is  any  advantage  in  using  the  additional  information 
given  by  y  in  developing  our  predictive  system,  given  that  this  information  (y)  is  correlated 
with  information  we  already  have  (namely,  x).  In  other  words,  we  are  using  additional 
information  that  is  redundant. 

All  the  variables  (x,  ?/,  and  z)  have  noise,  so  we  can  use  the  results  derived  earlier. 
From  Equation  (4.16)  we  can  write  that 

o_  _  1  +  Tjy  -  r  ( brjy  ys^)  /  (a%  Vfxx) 

a  ~  Vx  (1  +  r]x)(l  +  r)y)  -  r2 


From  Equation  (4.68)  we  have  that  r  =  1  (since  x  and  y  are  completely  correlated)  and 
xx ,  then  the  above  expression  becomes 


syy  = 


£  _  1  _  1  ±Vy  ~  (KI6>D/(q^) 

a  l  +  Vv  +  %/% 


(4.70) 


1 

c  —  a 

b  riy 

± 

b 

a  T)x(l  +T]y) 

1  + 

1  -T$ 

'  l  +  $  ’ 


Vy 


rjx{\  +  r/y ) 


where  T  =  sgn(6/a)|c/a  —  1|  and  $  =  7]y/[rjx(l  +%)]•  (The  operator  “sgn”  takes  the  sign 
of  its  argument,  that  is  sgn(x)  =  x/\x\.) 

We  now  see  that,  just  like  the  ordinary  LS  approximation,  the  redundant  LS  approx¬ 
imation  of  a  one-dimensional  system  (Equation  (4.69))  is  an  inconsistent  approximation 
of  the  true  solution  (except  for  the  special  case  =  1).  In  other  words,  the  error  in 
Equation  (4.69)  will  not  tend  to  zero  as  the  number  of  points  used  to  develop  the  LS 
solution  tends  to  infinity  (unless  =  1). 

Let  us  now  investigate  the  variance  of  the  prediction  error  for  the  two  approximations 
given  by  Equations  (4.67)  and  (4.69).  First  define  the  relative  error  (or  more  pedantically, 
the  negative  of  the  relative  error)  in  the  coefficient  a  given  by  Equation  (4.70)  as 

ae  =  1 - 

a 

=  !  +  %-  (HjgD/Hs) 

1+J h  +  Vyhx 

a  similar  expression  arises  for  the  coefficient  b.  Also  define  the  prediction  error  of  2;  as 


e?,  —  z<j  Zj , 


where  z\  is  the  prediction  given  by  Equation  (4.69)  and  Z{  —  ax\  +  byi  is  the  exact  solution 
(both  for  the  ith  measurement).  The  prediction  Z{  =  a(xi  +  U{)  +  b(yi  +  Vi)  involves  noise 
in  both  the  X{  and  yi  variables  given  respectively  by  and 
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Assume  the  noise  terms  are  uncorrelated  with  the  both  the  signal  and  other  noise  terms 
(that  is,  e  =  0  in  the  limit).  After  some  tedious  algebra  the  variance  of  the  prediction 
error  can  be  shown  to  be 

var(e)  =  {aae  +  6bbe)2sxx  +  a2{  1  -  ae)2suu  +  b2(  1  -  be)2svv.  (4.71) 

Re-arranging  the  relative  error  from  Equation  (4.70)  gives  that 

_  Vx  +  vxVy-vy\o\bla 
Vx+VxVy  +  Vy 

and  similarly  for  be  we  have  that 


b  =  Vy_±VxVy  -r}xa/(\0\b) 

Vx  +  VxVy  +  Vy 

A  small  amount  of  algebra  then  gives  the  following  three  relations 

aae  +  0bbe  =  —  -  H — sgn^^  ~  ~  7h/3i)} 

1  +Vx  +  Vx/Vy 

o(l  -  ae)  =  -sgn(^)  +  Q  t1  - 

l+Vx+Vx/Vy 

and 


6(1  _  be)  =  Vx  -  -  -  H  ~  Sg11^)] 

Vy  o  1  +  T]x  +  T)x/riy 


To  allow  a  comparison  with  the  one-variable  LS  (Equation  (4.67))  let’s  make  the  sim¬ 
plifying  assumption  that  0  >  0  (that  is,  there  is  a  positive  correlation  between  x  and  y). 
Substituting  the  above  three  simplified  expressions  into  Equation  (4.71)  gives 


var(e)  = 


1  +  Vx  +  Vx/Vy 


2 

Vx^xx  &uu  +  Sy 


Vx 


1  2 


VyO  J 


Dividing  both  sides  by  sxx  (in  other  words  var(x))  and  using  the  relation 


simplifies  the  above  result  to 


Svv  Sw  Syy 
&xx  Syy  &xx 


var(e)  _  c2  /  % 

var(x)  \ 1  +  Vx  +  Vx/Vy 


(4.72) 


Now  let’s  develop  the  variance  of  the  prediction  error  for  the  one-variable  approxi¬ 
mation  given  by  Equation  (4.67).  We  have  already  developed  the  relative  error  of  the 
coefficient  c  (see  Equation  (4.36))  from  which  we  know  that 

1 

c  =  c- - . 

1  +  Vx 


58 


DSTO-RR-0204 


Define  the  predictive  error  of  the  one-variable  LS  prediction  (given  by  Equation  (4.67))  as 


where  S*  =  c(xi  +  ui).  Following  the  same  procedure  as  for  the  two- variable  case  we 
have  that  the  variance  of  the  predictive  error  for  the  one- variable  approximation,  given  by 
Equation  (4.67),  is 


var(e)  =  ^  (  r\x  \ 
var(x)  \  1  +7lx) 


(4.73) 


Comparing  the  one-variable  prediction  (Equation  (4.73))  and  the  two-variable  predic¬ 
tion  (Equation  (4.72))  we  see  that  the  two- variable  prediction  produces  superior  predic¬ 
tions,  especially  when  rjx/rjy  ^  1.  Remember,  however,  that  the  one- variable  ordinary  LS 
prediction  is  inconsistent  (as  is  the  two- variable  redundant  LS  technique). 

A  simulation  of  the  five  member  truss  (developed  earlier)  demonstrated  the  inconsistent 
behaviour  of  the  redundant  LS  (Equation  (4.66))  technique.  Figure  4.10  shows  plots  of 
the  predictive  error  for  the  surrogate  matrix  (given  by  Equation  (4.65))  and  the  redundant 
LS  (given  by  Equation  (4.66)). 

As  before,  we  define  the  term  “approximation  order”  as  the  number  of  simulation 
measurements  used  to  develop  the  prediction  equation  (the  horizontal  axis  in  the  plots 
of  Figure  4.10).  Unlike  the  previous  simulation,  we  were  not  able  to  simply  compare  the 
approximate  coefficients  a  and  b  to  the  exact  coefficients  a  and  6,  since  the  redundant 
LS  now  contains  four  coefficients  instead  of  two  (see  Equation  (4.66)).  As  such  we  used 
the  error  in  predicting  the  stress  at  gauge  OC,  as  compared  to  the  exact  stress  at  gauge 
OC,  as  a  measure  of  accuracy.  For  each  approximation  order  we  developed  a  set  of  10 
prediction  equations  (for  both  the  surrogate  matrix  and  redundant  LS  techniques).  Each 
of  these  10  equations  were  developed  using  different  randomly  generated  measurement 
sets  (which  were  statistically  equivalent).  We  then  randomly  generated  a  new  set  of  100 
measurements,  and  used  this  set  to  test  the  accuracy  of  the  10  prediction  equations.  In  all, 
1000  simulation  measurements  were  used  to  compare  the  accuracy  of  both  the  surrogate 
matrix  and  redundant  LS  techniques  for  each  approximation  order. 

The  absolute  values  of  the  relative  error  (that  is,  \{aoc  —  Voc)/voc\)  of  these  1000 
simulated  measurements  were  then  used  to  develop  the  spread  of  error  plots  shown  in 
Figure  4.10.  For  both  the  surrogate  matrix  and  the  redundant  LS  shadings,  the  lower 
boundary  represents  the  first  quartile,  the  middle  line  the  second  quartile  (or  median),  and 
the  upper  boundary  the  third  quartile.  The  light  grey  and  dark  grey  shadings  represent 
the  error  spread  for  the  surrogate  matrix  and  redundant  LS  techniques  respectively.  The 
four  gauge  redundant  LS  plots  for  the  “good”  and  “bad”  systems  would  be  identical  if  the 
same  set  of  random  data  measurements  were  used,  as  it  is  these  plots  exhibit  the  same 
behaviour.  Notice  that  as  the  approximation  order  increases  the  redundant  LS  plots  tend 
to  the  same  limit  for  both  the  “good”  and  “bad”  systems. 

The  behaviour  exhibited  by  the  redundant  LS  is  the  same  as  that  of  the  ordinary  LS 
(compare  Figure  3.7  and  4.10).  For  low  order  approximations  the  LS  technique  produces 
smaller  errors  than  the  surrogate  matrix  technique  for  the  “bad”  system  (the  results  of 
the  two  techniques  are  comparable  for  the  “good”  system).  However,  for  high  order  ap¬ 
proximations  the  performance  of  the  surrogate  matrix  technique  is  superior.  Surprisingly, 
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Figure  4-10:  Plots  of  the  predictive  error  spread  for  both  the  surrogate 
matrix  (light  grey)  and  the  redundant  LS  (dark  grey)  techniques.  The 
lower  and  upper  boundaries  of  the  shading  represent  the  first  and  third 
quartiles  respectively.  The  lines  drawn  near  the  centre  of  each  shading 
represent  the  median  ( or  second  quartile ). 

the  surrogate  matrix  technique  even  performed  better  for  the  “bad”  system.  This  result  is 
surprising  since  for  the  bad  system  (the  bottom  plot  of  Figure  4.10)  we  are  only  using  the 
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two  “bad”  gauges  in  the  surrogate  matrix  technique  to  predict  the  stress  at  the  gauge  OC. 
In  contrast,  the  redundant  LS  technique  uses  all  four  gauges,  including  the  two  “good” 
gauges  to  estimate  the  stress  at  gauge  OC. 

Johnson  and  DiNardo  [13,  p.  88]  provide  results  to  a  similar  problem.  They  investigate 
the  effects  of  redundancy  (or  collinearity)  on  the  same  system  as  Equation  (4.69),  except 
they  assume  there  is  no  noise  in  the  variables  x  and  y.  Under  these  assumptions  they 
show  that  the  variance  of  the  constants  a  and  b  are 


var(a)  — 


(1  —  r2)sx 


and  var(6)  = 


{l~r2)Syy 


respectively,  where  sww  is  the  amount  of  noise  in  the  dependent  variable  z.  These  equations 
show  that  the  variance  of  the  coefficients  a  and  b  tend  to  infinity  as  the  two  inputs  x  and 
y  become  more  correlated.  It’s  interesting  to  note  that  under  this  collinear  condition 
the  results  may  be  improved  by  increasing  noise  in  the  variables  x  and  y.  This  noise 
increase  would  reduce  the  correlation  between  x  and  y,  and  hence  (according  to  the  above 
equations)  reduce  the  variance  of  the  a  and  b  coefficients!  We  need  to  balance  this  unusual 
statement  with  results  we  have  already  established;  namely,  for  noise  in  the  variables  x 
and  y  the  LS  technique  is  inconsistent.  So  that  although  the  variance  of  the  coefficient  a 
and  b  reduces  with  increasing  noise,  the  inconsistency  increases.  In  other  words,  we  are 
getting  a  tighter  bound  on  a  poorer  solution. 

In  this  subsection  we  have  shown  that  the  redundant  LS  technique  is  inconsistent  (for 
the  same  reason  as  the  ordinary  LS  technique).  We  also  discovered  that  the  redundant 
LS  is  marginally  superior  to  the  ordinary  LS  when  noise  is  present,  and  leads  to  solutions 
with  wide  uncertainty  intervals  when  the  noise  tends  to  zero. 
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5  Total  Least  Squares  (TLS) 

In  the  previous  section  we  found  that  the  ordinary  least  squares  (LS)  technique  yielded 
inconsistent  results  when  there  was  noise  in  both  the  input  and  output  signals.  The 
inconsistency  was  able  to  be  corrected,  resulting  in  the  corrected  LS  technique.  In  this 
section  we  investigate  the  properties  of  another  LS  based  technique  termed  total  least 
squares  (TLS),  and  determine  which  of  these  techniques  is  best  suited  to  developing  the 
transfer  functions  we  require. 

We  begin  by  giving  a  geometric  interpretation  of  the  TLS.  The  next  subsection  reviews 
comparisons  that  have  been  made  between  the  LS  and  TLS  methods.  We  then  investigate 
the  variance  of  both  the  corrected  LS  and  the  total  LS  techniques.  Finally,  we  report  on 
simulations  that  confirm  theoretical  results  presented  in  the  preceding  subsections. 


5.1  Introduction  to  the  Total  Least  Squares 

Nievergelt  [23]  presents  a  short  introduction  into  the  TLS  problem,  where  the  deriva¬ 
tion  of  the  TLS  solution  is  based  on  a  geometric  interpretation  of  results. 


y 


2.8 

2.6 

2.4 

2.2 


Figure  5.1:  The  lines  xLS  and  yLS  (left  plot)  minimise  the  distance  to 
the  data  points  in  the  x -  and  y-  directions  respectively.  The  right  plot 
has  the  same  data  points  and  plots  the  TLS  line ,  which  minimises  the 
perpendicular  distance  to  the  data  points. 


Figure  5.1  shows  the  difference  between  a  LS  solution  and  a  TLS  solution  for  two 
dimensional  data.  (A  similar  figure  appears  in  Golub  and  van  Loan  [9]  and  also  in  Niev- 
ergelt  [23].)  The  plot  on  the  left  shows  the  LS  solutions  xLS  (dashed  line)  and  yLS  (solid 
line)  that  minimise  the  distance  to  the  data  points  in  the  x-  and  y-  directions  respectively. 
(For  clarity  the  horizontal  distance  from  the  data  points  to  the  Xi$  line  are  not  shown.) 
The  plot  on  the  right  shows  the  TLS  solution  (solid  line)  that  minimises  the  distance  to 
the  data  points  (independent  of  direction).  From  geometry  we  know  that  the  shortest 
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distance  from  a  point  to  a  line  lies  in  a  perpendicular  direction  to  the  line  passing  through 
the  desired  point,  and  hence  all  distance  emanating  from  the  TLS  line  to  the  data  points 
are  perpendicular  to  the  TLS  line.  For  comparison  the  xLS  and  yLS  lines  are  also  shown  in 
the  right  plot  (drawn  as  dashed  lines).  Finally,  notice  that  all  three  approximations  (xLSi 
yLS,  and  tls)  pass  through  the  centroid  of  the  data  points,  and  as  might  be  expected  the 
xLS  and  yLS  solution  bound  the  tls  solution. 

The  plots  in  Figure  5.1  highlight  the  fact  that  the  TLS  approximation  is  only  superior 
if  the  variables  used  in  the  approximation  are  of  equal  importance.  However,  in  our 
situation  the  output  variable  (or  dependent  variable)  is  more  important  than  the  input 
variable  (or  independent  variable),  and  hence  it  is  more  important  to  minimise  the  error 
in  the  direction  of  the  output  variable.  (Note  that  in  some  situations  the  error  in  some  of 
the  variables  may  be  significantly  larger  than  the  error  in  the  remaining  variables.  Under 
this  condition  it  may  be  preferable  to  perform  the  LS  in  the  direction  of  the  variables  with 
largest  errors.) 

As  is  clearly  pointed  out  by  Van  Huffel  and  Vandewalle  [35],  the  errors-in- variables 
model  is  useful  when  the  primary  goal  is  model  parameter  estimation  rather  than  predic¬ 
tion.  For  prediction  and  also  when  the  data  significantly  violates  the  model  assumptions 
(for  example,  when  outliers  are  present),  the  TLS  estimates  may  be  quite  inferior  to  LS 
estimates. 

For  completion  we  list  a  result  obtained  by  Golub  and  van  Loan  [9]:  the  condition  of 
the  TLS  problem  is  always  worse  than  the  condition  of  the  corresponding  LS  problem. 


5.2  Comparing  the  LS  and  TLS  Methods 

In  this  subsection  we  begin  our  comparison  by  reviewing  results  related  to  the  two- 
dimensional  case. 

Chen  and  Papadopoulos  [5]  compare  the  errors  in  approximating  the  slope  and  the 
^-intercept  of  the  least  squares  (LS)  and  total  least  squares  (TLS)  lines.  The  exact  linear 
system  z  =  ax  +  b  is  approximated  by  z  =  ax  +  6,  where  a  and  b  are  the  slope  and 
^-intercept  of  the  exact  line,  and  a  and  b  are  approximations  of  these  constants.  The  true 
values  of  x  and  2  have  some  error  when  measured  and  hence  we  obtain  the  values  x  and 
z  respectively.  Assuming  the  random  noise  is  independent,  has  zero  mean,  and  that  the 
variance-covariance  matrix  exists,  they  develop  errors  for  the  slope  and  ^-intercept  using 
a  first  order  series  approximation.  (They  additionally  assume,  without  loss  of  generality, 
that  the  m  measured  points  used  to  develop  the  LS  or  TLS  have  zero  mean.)  The  LS  and 
TLS  are  found  to  have  the  same  errors  (to  first  order)  for  the  slope  and  ^-intercept  given 
by 

a  SXX  "b  SZZ  —  2 dSxz  _ j  a  SXX  "b  Szz  —  2aSXZ 

Sas  « - - -  and  srb  « - - - , 

&xx  //v 

where  Saa  and  s are  the  variances  of  the  LS  or  TLS  slope  and  ^-intercept  respectively, 
and  Szz  are  the  variances  of  the  measured  inputs  x  and  z  respectively,  sXz  is  the 
covariance  between  the  measurements,  and  sxx  is  the  variance  of  the  exact  x  input.  They 
conclude  that  to  minimise  the  variance  of  the  slope’s  error,  the  sample  points  x  should  be 
distributed  as  evenly  as  possible  at  the  end  points  of  the  x  input  range  (thus  maximising 

sxx )• 
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Riggs,  Guarnieri,  and  Addelman  [27]  empirically  investigate  thirty-four  different  fitting 
procedures  (eleven  previously  published  and  twenty-three  derived  by  the  authors).  They 
state  that  all  good  fitting  procedures  have  three  essential  characteristics:  invariance  under 
linear  transformations  (the  fitting  coefficient  is  dimensionally  correct),  small  root-mean 
square  error,  and  a  small  bias.  The  empirical  investigation  involves  approximating  the  line 
z  ~  x  with  the  variable  x  varying  over  a  range  of  600  units.  The  exact  x  measurements 
were  randomly  and  evenly  generated  using  a  Monte  Carlo  method  and  noise  was  added 
to  the  Xi  and  z%  values  yielding  x\  =  x\  +  Ui  and  %  =  Zi  +  respectively.  The  variances 
of  u  and  w  (the  amount  of  noise)  were  independently  varied  using  one  of  5.33%,  10.6%, 
21.3%,  or  42.7%  noise.  The  number  of  points  were  also  varied  over  6,  12,  24,  and  48. 

Citing  work  from  the  1800s  and  early  1900s  Riggs  et  al  [27]  provide  the  expression  for 
the  TLS  approximation  of  a  as 

^  szz  ~  sxx  +  \j {szz  “  sxx)2  +  - 

a  ~ - - - - 

where  =  var(rr)  and  s%z  =  cov(x,  z).  The  same  expression  shown  above  can  be  derived 
from  Linnik  [18,  Eq.  (0.1.29)],  except  the  sign  of  the  radical  becomes  ±.  The  above 
equation  is  termed  the  “maximum  likelihood”  estimator  by  Anderson  and  Sawa  [2]. 

Riggs  et  al.  state  that  the  TLS  solution  is  variant  under  linear  transformations  (that 
is,  incorrectly  dimensioned),  and  cite  several  papers  for  the  correctly  dimensioned  result 

^  szz  —  ^sxx  +  y  ( szz  ~  ^ sxx )2  +  4A  s|- 


where  A  sww/suu  (that  is,  ratio  of  z  noise  to  x  noise).  Johnston  [12,  Eq.  (6-15)]  obtains 
the  result  shown  above,  except  the  sign  in  front  of  the  radical  becomes  ±,  and  terms  this 
the  “classical  errors-in-variables”  approach.  Johnston  states  that  the  sign  in  front  of  the 
radical  is  determined  by  the  sign  of  the  covariance  between  the  input  and  output  signal 
(that  is,  positive  if  >  0  and  negative  otherwise). 

As  the  ratio  of  z  noise  to  x  noise  tends  to  infinity  (that  is,  A  — *  oo)  the  above  expression 
becomes  the  classical  LS  technique  (minimising  the  vertical  distance  between  LS  line 
and  the  measured  points).  Alternatively  as  the  ratio  of  z  noise  to  x  noise  tends  to  zero 
(that  is,  A  — >  0)  the  above  expression  becomes  the  LS  technique  for  the  regression  of 
x  on  z  (minimising  the  horizontal  distance  between  LS  line  and  the  measured  points). 
Johnston  [12,  p.  155]  makes  the  same  observations,  and  goes  on  to  state  that  these  two 
special  cases  (A  — >  0  and  A  — >■  oo)  of  the  conventional  LS  solution  may  be  regarded  as 
extreme  limiting  cases  of  the  general  errors-in-variables  model. 

Due  to  the  empirical  nature  of  their  work,  Riggs  et  al.  found  it  hard  to  draw  definite 
conclusions. 


5.3  Variance  of  the  Corrected  LS  and  the  Total  LS 

In  this  subsection  we  compare  the  variance  of  both  the  corrected  LS  and  the  total 
least  squares  (TLS)  solutions  for  a  general  n-dimensional  system.  In  particular  we  review 
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results  that  compare  the  variance  of  these  two  techniques,  and  show  that  TLS  always  has 
a  larger  variance  than  LS  (despite  the  fact  that  TLS  is  consistent  and  LS  is  not). 

Assume  the  errors  in  all  the  data  are  independently  and  identically  distributed  with 
zero  mean  and  the  common  variance  matrix  a2 1  (where  a2  is  the  error  variance).  Then 
Van  Huffel  and  Vandewalle  [35,  p.  234]  show  that  the  TLS  solution  is  a  strongly  consistent 
estimate  of  the  parameters  in  the  linear  errors- in- variables  model. 

Consider  the  linear  system  given  by  Equation  (4.41),  and  the  related  approximate 
linear  system 

Xa  —  z, 

where  X  =  X  +  £7,  z  =  z  +  w,  and  U  and  w  are  the  matrix  and  vector  of  input  and 
output  noise  respectively  .  The  closed-form  expression  for  the  basic  TLS  solution  is  given 
by  [35,  Eq.  (2.27)] 

aTLS  =  (  XTX  -  al+^y1  XTz, 

where  an+i  is  the  smallest  singular  value  of  the  augmented  matrix  [X ,  z]  £  Mmx(n+1). 
The  above  solution,  however,  should  not  be  used  in  the  computation  of  the  TLS  solution. 
The  generalised  TLS  problem  (which  considers  a  matrix  Z  of  outputs  instead  of  a  vector 
z)  is  given  by  Van  Huffel  and  Vandewalle  [34],  who  also  provide  a  robust  algorithm  for 
the  computation  of  the  TLS  solution  (based  on  generalised  SVD). 

Fixed  sample  size  distribution  theory  is  termed  “unwieldy”  by  Van  Huffel  and  Van¬ 
dewalle  [34],  and  hence  they  consider  only  asymptotic  distributions  (that  is,  large  sam¬ 
ple  sizes).  Assume  the  errors  in  the  input  and  output  are  independent  and  identically 
distributed  with  common  zero  mean  vector  and  a  common  covariance  matrix  (cr2In+1). 
Furthermore,  assume  that  the  covariance  matrix  of  exact  inputs 

£*=  lim  —  XTX 

777  — t-OO  m 


exists,  or  in  other  words  assume  that  there  exists  a  linear  solution  to  the  system  (4.41). 
Referring  to  Equation  (4.44)  we  see  that  the  above  covariance  matrix  is  the  limiting  case 
of  the  sample  covariance  matrix,  that  is,  =  lim m-^oo  Sx.  The  asymptotic  distribution 

of  y/m(aTLS  —  a)  is  then  multivariate  normal  with  zero  means  [34,  p.  241]. 

Variance  results  are  obtained  if  we  further  assume  that  is  positive  definite  (that  is, 
the  underlying  system  X  has  full  rank)  and  that  the  errors  posses  the  third  and  fourth 
moments  of  a  normal  distribution.  (Remember  that  a  matrix  has  full  rank  if  all  the 
singular  values  are  positive,  see  §  2.)  Using  these  two  additional  assumptions,  we  know 
that  y/m{aTLS  —  a)  is  asymptotically  normally  distributed  with  zero  mean  and  covariance 
matrix  [34,  Eq.  (8.45)] 


varoo  [Vm(aTLS  -  a)]  =  a2  (l  +  Halil) 


Ea.  1  +  <t2Y,x  1  (ln  +  aaT)  1 1  , 


(5.1) 


which  may  be  approximated  as  <j2(1  +  ||a||2)Xx_1  for  a  small  amount  of  noise  (that  is, 
a  <C  1).  The  above  expression  differs  from  a  result  given  by  Schneeweifi  [30]  as  we  now 
show. 
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Define  as  the  covariance  matrix  of  input  errors 

£u=  lim  —UTU 

m— >00  m 

~  diag  (£(^i)2,...,£(un)2)  , 

where  uj  represents  the  noise  on  the  jth  input  variable  (which  is  the  jth  column  of  U ). 
The  second  line  of  the  above  displayed  equation  follows  from  the  first  line  since  we  have 
assumed  that  the  input  noises  are  uncorrelated.  Also  define  the  kurtosis  matrix  as 

K  =  diag(&i, . . . ,  /cn),  where  kj  =  a 2  | £  (uj)  —  3  [£  (w^)]  2|  for  j  =  1, . . .  ?  n 

and  aj  is  the  jth  element  of  the  coefficient  vector  a.  It  is  interesting  to  note  that  kj /a2 
is  exactly  the  definition  of  the  fourth  cumulant  about  the  mean,  see  for  example  Kendall 
and  Stuart  [14,  Eq.  (3.43)].  The  asymptotic  variance  of  the  corrected  LS  error  is  then 
given  by  [30,  Eq.  (5.4)] 

var ooW^{aCLS-a)\  =  E [f  (z2)  +  arSua]  Sj  +  K  +  (5.2) 

^  J1  ^ 

where  =  limm^00(l  /m)X  X  . 

The  TLS  solution  is  simply  a  special  case  of  the  corrected  LS  solution  [34,  p.  243],  in 
that  for  the  TLS  solution  all  noise  terms  are  assumed  to  have  the  same  magnitude  (that 
is,  £(z2)  =  a2  and  =  a2 1).  Using  this  assumption  together  with  the  assumption  that 
the  fourth  moment  is  the  same  as  that  of  a  normal  distribution  (that  is,  K  —  0),  then 
Equation  (5.2)  becomes 

vaioo  [>/m  (aTLS  -  a)]  =  a2  ( 1  +  aTa)  a  aT S^1.  (5.3) 

From  the  definition  of  X  we  have  that  X  =  X  +  U  and  hence 

-X  Tx  =  -  (XTX  +  XTU  +  UTX  +  UTU) . 

m  m  ' 

Taking  the  limit  as  m  — >  oo,  remembering  that  we’ve  assumed  the  input  and  noise  are 
uncorrelated,  we  obtain 


—  5] x  + 

“  ^x  +  V2 In- 

Equation  (5.3)  may  then  be  written  as 


(5.4) 


vnroo  [\/m  {O'tls  ~~  ®)]  —  <r2  (l  +  1 1 I2) 

Equations  (5.1)  and  (5.5)  are  clearly  different.  The  difference,  however,  is  not 
first  appearances  may  suggest. 


-1 


+  a2£; 


-1 


In  + 


aax 


1  +  I  lot  I 


(5.5) 
as  great  as 


It  can  easily  be  shown  that  taking  the  product  of  the  two  expressions  In  ±  aaT  and 
In  =F  aaT/ (1  +  | HI2)  (note  the  opposite  signs  in  front  of  the  terms  aaT)  gives  the  identity 
matrix  (see  Appendix  A).  We  then  have  that 


In 


1  +  ||a| 


aa 


(In  +  aaT)  \ 
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where  we’ve  made  use  of  the  fact  that  In  +  aaT  is  invertible  (see  Appendix  A).  Thus  it 
appears  as  if  there  is  a  typographical  error  in  one  of  the  two  equations  for  the  covariance 
matrix  of  the  TLS  method  (that  is,  Equation  (5.1)  or  (5.5)  derived  by  Van  Huffel  and 
Vandewalle  [35]  and  Schneeweifi  [30]  respectively).  Perhaps  the  sign  in  front  of  the  term 
aaT  should  be  reversed  in  one  of  these  equations?  At  any  rate,  for  small  amounts  of  noise 
(that  is,  a  <C  1)  Equations  (5.1)  and  (5.5)  agree  more  closely  (since  the  second  term  in 
both  equations  becomes  small). 

For  completeness  we  also  cite  the  variance  of  the  corrected  LS  derived  independently 
by  Naidu  [22]  as  var^  ( aCLs  -  a)  =  (a2 /m)(E®  -  Ew)“1Ej(S^  -  £w)_1.  Using  the  result 
given  by  Equation  (5.4),  this  variance  expression  may  be  re-written  as 

varoo  [Vm(aTLS  -  a)]  =  cr2^-1  +  (5.6) 

which  is  different  again  from  the  two  results  already  quoted  in  Equations  (5.1)  and  (5.5). 
This  difference,  however,  may  be  due  to  a  being  defined  (somewhat  unclearly)  as  the  norm 
of  the  residual  (that  is,  if  e  =  X  a  —  z  then  £(eeT)  =  a2 In,  it  is  unclear  whether  or  not 
a  —  aLS) .  Finally,  a  result  with  similar  form  to  Equations  (5.1),  (5.5),  and  (5.6)  is  given 
by  Fuller  [7]  in  Theorem  2.3.2  and  Equation  2.3.26,  which  are  not  included  in  this  report 
due  to  the  complexity  in  understanding  the  development. 

Using  Equation  (5.1)  the  covariance  of  the  TLS  solution  can  be  approximated  by  the 
matrix  [34,  Eq.  (8.46)] 

cov(aTLS)  «  a2(l  +  HaHaX-X^X)-1,  (5.7) 

as  compared  to  the  covariance  of  the  LS  solution 

cov(ats)  =  a2(XTX)~\ 

From  the  above  two  equations  we  see  that  the  TLS  solution  always  has  a  larger  covariance 
than  the  LS  solution.  We  have  already  come  across  this  phenomenon  for  the  univariate 
corrected  LS  case  in  §  4.5,  see  especially  Equation  (4.38). 

For  a  finite  sample  size  Equation  (5.7)  may  be  approximated  by  [34,  Eq.  (8.47)] 

cov(oTLS)  «  cr2(l  +  \\a\\2)(XT X  -  mcr2Jn)-1, 

where  a2  «  cr2+ l/m  and  an+i  is  the  (n  +  l)th  singular  value  of  the  augmented  matrix 
[X ,  z\.  (Remember  that  m  is  the  number  of  measurement  samples.) 

In  this  section  we  have  shown  that  despite  the  LS  procedure  being  inconsistent  and 
the  TLS  being  consistent,  the  variance  of  the  TLS  procedure  is  larger  than  that  of  the  LS 
procedure.  Furthermore,  since  the  TLS  is  a  special  form  of  the  corrected  LS,  we  suspect 
that  the  variance  of  the  corrected  LS  procedure  is  also  larger  than  that  of  the  LS  procedure 
(a  theory  our  truss  simulations  support). 


5.4  Simulations  Results 

In  this  subsection  we  present  some  numerical  simulations  to  support  the  theoretical 
work  of  the  previous  subsection. 
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Van  Huffel  and  Vandewalle  [34]  performed  Monte  Carlo  simulations  and  found  that 
even  for  moderate  sample  sizes  (20  observations  or  more)  the  asymptotic  results  detailed 
above  provided  good  approximations.  In  their  simulations  they  use  the  same  measure  as 
Ketellapper  (see  Equation  (4.39)),  namely  the  mean  square  error  (MSE)  given  by 


MSE  (a)  =  £  [{a-af  (a  -  a) 

=  £  {[a-  £(a)  +  £(a)  -  a]T  [a  -  £(a)  +  £{a)  -  a]| 

=  £  { [“  -  £(a)}T  [2  -  £(a)} }  +  £  { [a  -  £(a)]T  [5(a)  -  a] } 

+  £  {[5(a)  -  a]T  [a  -  £  (a)]}  +  £  {[5(a)  -  af  [£(a)  -  a]} 

=  £  ([a  —  £(a)]  [a  —  £(a)]|  +  [£(a)  ~  a\T  [£(2)  —  a] 

=  total  variance  (a)  +  squared  bias  (a), 


(5.8) 


(5.9) 


the  cross-product  terms  being  equal  to  zero  since  £[a-£(a)]  =  0.  (For  a  further  discussion 
on  the  MSE  measure  see  §  4.5  and  also  Kendall  and  Stuart  [15].)  The  squared  bias,  total 
variance,  and  MSE  of  the  simulation  results  were  normalised  by  the  Frobenius  norm  of 
the  solution  ||o||^  and  plotted  on  log-log  scales.  Van  Huffel  and  Vandewalle  then  make 
the  following  observations. 


•  TLS  bias  is  much  smaller  than  that  of  LS,  and  both  biases  decrease  as  the  number 
of  observations  m  increases.  This  bias  difference  is  more  pronounced  when  the  noise 
variance  increases,  m  increases,  X  is  ill-conditioned,  or  a  is  orientated  towards  the 
null  space  of  X. 

•  TLS  total  variance  is  larger  than  that  of  LS.  The  rank  reduction  of  X ,  due  to  large 
noise  variances,  causes  an  increase  in  bias  but  a  decrease  in  variance  of  the  solution. 

•  TLS  and  LS  solutions  have  comparable  MSE  for  small  noise  variances,  but  for  large 
noise  variances  the  TLS  performs  better  under  the  MSE  measure. 

Van  Huffel  and  Vandewalle  cite  other  authors  as  giving  similar,  though  more  detailed, 
conclusions. 

Heavy-tailed  error  distributions  and  outliers  both  deteriorate  the  performance  of  the 
TLS,  where  the  clear  dominance  of  TLS  over  LS  is  only  apparent  for  large  sample  sizes  (m 
large).  These  results  are  backed  by  both  theoretical  analysis  and  Monte  Carlo  simulations 
(see  [35,  p.  248]  and  citations  therein). 
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6  Conclusion 

We  reviewed  singular  value  decomposition  (SVD)  and  determined  when  a  two-by-two 
system  would  be  both  well-  and  ill-conditioned.  We  have  seen  that  the  SVD  decomposition 
of  a  matrix  is  simply  an  orthogonalisation  of  the  matrix  into  a  sum  of  rank  one  matrices. 
The  condition  number  of  a  two-by-two  matrix  allowed  us  to  investigate  the  relation  between 
gauge  configuration  and  ill-conditioning.  A  two-by-two  matrix  becomes  ill-conditioned 
when  the  determinate  is  zero,  and  is  perfectly- conditioned  when  it  has  a  symmetric-like 
form. 

We  then  explained  how  the  vector  and  matrix  techniques  worked.  The  development 
of  a  simple  determinate  truss  allowed  us  to  investigate  the  effects  of  gauge  placement  and 
measurement  noise  on  the  development  of  stress  transfer  functions.  To  obtain  sensible 
stress  solutions  for  a  truss  with  two  external  loads,  the  minimum  number  of  beam  members 
required  was  found  to  be  five.  We  developed  the  stress  equations  for  each  member  of  this 
truss,  with  beams  of  varying  length  (and  hence  varying  truss  geometries).  We  used  this 
protean  truss  in  simulations  to  support  theoretical  developments. 

Assuming  our  predictive  system  was  linear,  we  developed  a  relationship  between  the 
exact  coefficients  and  the  approximate  coefficients.  This  assumption  merely  requires  that 
the  predictive  system  be  a  linear  function  of  its  variable,  and  not  that  the  variables  be 
linear  in  terms  of  the  some  global  variables. 

Further  assuming  that  the  noise  is  uncorrelated,  we  saw  that  as  the  two  inputs  become 
more  correlated  the  troublesome  denominator  tends  to  zero,  and  hence  the  error  tends 
to  infinity.  Thus  our  coefficient  estimates  (for  the  predictive  system)  deteriorate  as  the 
system  becomes  more  ill-conditioned  (even  when  only  a  small  amount  of  noise  is  present). 

To  determine  the  behaviour  of  these  coefficient  estimations  we  further  assumed  that 
the  two  input  signals  and  associated  noises  had  the  same  variance.  The  error  in  the 
approximating  coefficients  is  zero  under  two  conditions:  when  the  noise  is  zero  or  when 
the  noise  exactly  cancels  the  correlation. 

Assuming  that  the  noise  was  much  closer  to  zero  than  the  correlation  was  to  unity,  we 
gained  a  greater  understanding  of  the  effects  of  exact  coefficient  ratio  on  errors.  A  contour 
plot  of  the  noise  amplifying  function  emphasised  the  importance  of  having  input  signals  of 
a  similar  order  of  influence  on  the  output  signal.  We  found  that  if  the  exact  coefficients  had 
the  same  magnitude,  then  noise  was  no  longer  a  problem,  even  for  ill-conditioned  systems. 
On  the  contrary,  under  this  condition  noise  is  beneficial.  The  well  known  result  of  least 
squares  (LS)  under-prediction  (from  errors-in-variables  theory)  was  observed.  From  the 
contour  plot  we  also  discovered  when  the  error  in  coefficient  estimation  was  zero,  and 
under  what  conditions  the  noise  would  be  amplified  detrimentally. 

The  LS  technique  (and  hence  the  vector  technique)  was  unbiased  but  inconsistent,  and 
hence  we  developed  a  correction  for  the  inconsistency.  We  developed  conditions  under 
which  this  LS  correction  might  be  sensitive  to  noise  estimates,  and  showed  that  the  use  of 
bad  noise  estimations  can  easily  send  the  correction  to  infinity.  The  superior  performance 
of  the  matrix  technique  resulted  from  the  elimination  of  the  troublesome  denominator  (a 
function  of  correlation),  making  the  matrix  technique  consistent. 

We  determined  both  the  correlation  and  condition  number  of  the  five  member  truss 
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used  for  simulations.  Correlated  input  signals  do  not  imply  an  ill-conditioned  system,  and 
conversely,  an  ill-conditioned  system  does  not  imply  correlated  input  signals.  Due  to  the 
complexity  in  developing  inequalities  for  near  unit  correlation  and  near  ill-conditioning,  we 
resorted  to  truss  simulations.  Unexpectedly,  but  rather  fortunately,  the  simulations  sug¬ 
gest  that  the  correlation  between  input  signals  is  always  better  than  the  system  predicted 
ill-conditioning. 

We  have  seen  that  the  corrected-vector  technique  (based  on  corrected  LS)  is  more 
accurate  than  the  ordinary- vector  technique  (at  least  for  large  approximation  orders). 
The  accuracy,  however,  is  still  inferior  to  that  of  the  matrix  technique,  which  is  most 
likely  due  to  the  sensitivity  of  errors  in  noise  estimation. 

The  surrogate-matrix  technique,  which  substituted  additional  gauges  for  external  loads, 
performed  as  well  as  was  theoretically  predicted.  Like  the  matrix  technique,  the  surrogate- 
matrix  technique  eliminated  the  troublesome  denominator  found  in  the  vector  technique. 
Of  all  the  techniques  sampled  thus  far,  the  matrix-based  techniques  have  the  highest  ac¬ 
curacy,  but  require  additional  gauges  as  compared  to  the  vector-based  techniques.  We 
also  noted  that  the  vector-based  techniques  perform  with  comparable  accuracy  to  the 
matrix-based  techniques  for  well  behaved  systems. 

Despite  the  LS  procedure  being  inconsistent  and  the  total  least  squares  (TLS)  being 
consistent,  the  variance  of  the  TLS  procedure  is  larger  than  that  of  the  LS  procedure. 
A  review  of  the  TLS  technique  demonstrated  that  for  the  development  of  stress  transfer 
functions  the  corrected  LS  technique  was  better  than  the  TLS,  since  the  TLS  was  simply 
a  special  case  of  the  corrected  LS.  These  results  explain  why  the  variance  of  the  corrected- 
vector  technique  was  larger  than  the  variance  of  the  ordinary- vector  technique. 

In  conclusion  we  recommend  the  use  of  the  surrogate-matrix  technique  for  ill-conditioned 
systems,  due  to  its  accuracy  (in  developing  predictive  linear  models)  and  ease  of  implemen¬ 
tation  (no  external  loading  is  required).  For  well  conditioned  systems,  we  recommend  the 
or  dinary- vector  technique,  due  to  its  ease  of  implementation  (requires  fewer  gauges  than 
the  surrogate-matrix  technique)  and  lower  variance  (as  compared  to  the  corrected-vector 
technique) . 
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Appendix  A:  Properties  of  the  Rank-One 

Matrix  aaT 


In  this  appendix  we  derive  several  properties  of  the  matrix  aaT,  where  a  is  a  vector.  In 
particular  we  show  that  aaT  is  idempotent-like  and  positive  semidefinite.  We  finally  use 
these  results  to  prove  the  invertibility  of  the  matrix  I  +  aaT . 


Consider  the  matrix  £a  =  aaT,  where  the  vector  a  G  Rnxl  and  hence  the  matrix 
£a  G  Rnxn.  Let  the  vector  a  have  the  elements  a\  for  i  =  1, . . . ,  n,  then  the  matrix  £a 
has  the  form 


a\ 

CL2CL1 

aid2  • ■ 
a\  ■■ 

•  aian 

0*2  &TI 

Ctn^l 

an(i2  *  ■ 

•  On  . 

(Al) 


We  see  that  the  matrix  £a  must  be  symmetric  and  also  rank  one  (or  rank  zero  for  a  =  0) 
since  all  rows  are  linearly  dependent.  Notice  from  the  above  equation  that  £a  ^  In  for 
any  vector  a  since  considering,  for  example,  the  first  two  elements  in  the  equation  £a  =  In 
we  arrive  at  the  contradiction  a\  =  <12  =  ±1  but  =  0.  Alternatively,  we  know  that 
rank(Jn)  —  n,  and  hence  £a  ^  In. 

We  also  see  that  if  ||a||2  =  1  then  the  matrix  £a  is  idempotent  (that  is  £a2  =  £a). 
More  generally  we  also  have  that 


0  T  T 

2 ja  =  eta  aa 

H.  1 0  T 

a\  \2a 

=  Iklll^a,  (A2) 

that  is  the  square  of  the  matrix  £a  is  a  scaler  multiple  (Hall2)  of  itself.  Johnston  and 
DiNardo  [13,  p.  483]  prove  that  each  eigenvalue  of  an  idempotent  matrix  is  either  zero  or 
one.  Using  the  above  idempotent-like  information  we  can  easily  generalise  that  proof  to 
show  that  the  eigenvalues  of  the  matrix  £a  are  either  zero  or  |  |a|  We  then  know  that  the 
matrix  £a  must  be  positive  semidefinite  (that  is,  all  eigenvalues  of  £a  are  non-negative). 

Let  A  and  z  be  an  eigenvalue  and  the  corresponding  eigenvector  of  the  matrix  £a  and 
let  B  =  In  +  £a.  By  definition  we  then  have  that  £az  =  Az,  and  upon  adding  z  to  both 
sides  gives  that 


z  +  £az  =  z  +  Az 
(In  +  £a)z  =  (1  +  A)z. 

(For  a  similar  proof  see  [13,  p.  482].)  Remember,  however,  that  £a  is  positive  semidefinite, 
and  hence  the  matrix  B  must  be  positive  definite  (that  is,  all  eigenvalues  of  B  are  positive). 
We  know  that  the  eigenvalues  and  singular  values  of  a  symmetric  matrix  are  identical,  and 
since  B  is  positive  definite  it  must  also  be  invertible  (refer  to  §  2  for  an  elucidation  of 
these  concepts). 
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Finally,  let’s  consider  the  following  product 

1 


(In  +  Sa)  I 


l  +  l|a|ira 


—  In 


1 


i  +  IWIi 


Sa- 


1  +  |  |a|  Ig 


2 


,  ,  i+]Mj|-i  1 
1  +  IHII  a  I  +  IMI2 


2  Ilall2^a 


=  In- 

Since  In  +  Sa  is  invertible  we  have  that 

1 


In- 


i  +  NI 


2  —  (In  "F  S a 


-1 
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Index 


2- nor m,  see  norm,  two 

amplifying  function 
noise  28 

AMSE,  see  asymptotic  mean  square  error 
assumption 
~s,  2 

linear  22 

uncorrelated  noise  25 
asymptotic 

mean  square  error,  37 

see  also  mean  square  error 
asymptotic  bias,  30 

bad 

gauges,  53 
system,  53 
bias,  68 

attenuation,  28 
correction,  see  jack-knife 
box-plot,  19,  51 

centroid,  40,  63 

CLS,  see  least  squares,  corrected  ~ 
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